Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
skunnavakkam avatarskunnavakkam avatar
Sudarsh Kunnavakkam

@skunnavakkam

https://sudarsh.com
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

About Me

I'm Sudarsh. I'm a first year at Caltech studying Physics. I also do research as a contractor at METR, work in the Shapiro Lab at Caltech, and other things. Please reach out if you'd like, or check out my website at https://sudarsh.com :D

Projects

Tooling + Model Orgs for CoT Faithfulness Research

Outgoing donations

Marius Hobbhahn
$2000
10 days ago

Comments

Tooling + Model Orgs for CoT Faithfulness Research
skunnavakkam avatar

Sudarsh Kunnavakkam

9 months ago

@NeelNanda i'm doing this by training against a monitor (so negatively rewarding when the monitor fires). the reasoning traces definitely have higher reward (fires the monitor less) and look by eye to be more obfuscated, but i think this "look by eye" is likely confirmation bias. i think continuing training from these checkpoints would work!

Transactions

ForDateTypeAmount
<9af369f0-dc1e-4577-9bb7-4ad4cb87131f>10 days agoprofile donation2000
Sudarsh Kunnavakkam10 days agocash to charity transfer2000
Manifund Bank9 months agowithdraw1000
Tooling + Model Orgs for CoT Faithfulness Research9 months agoproject donation+3000