$0total balance

$0charity balance

$0cash balance

$0 in pending offers

About Me

For approximately the past year, I’ve been doing alignment research full-time, working on a variety of approaches, and trying to understand the problem in-depth enough to invent new ones. If funded, I plan to continue doing approximately the same work as before, which has historically been scalable mechanistic interpretability, formal and prosaic corrigibility, reflective stability, and a bunch of value theory stuff. Along with lots of upskilling in convex optimization, machine learning, neuroscience, and economics.

My current project is an attempt to connect the tools & theory of singular learning theory with our knowledge of the inductive biases and loss landscapes of large language models.

Projects

Garrett Baker salary to study the development of values of RL agents over time

Outgoing donations

AI Safety Reading Group at metauni [Retrospective]

$10

over 1 year ago

Act I: Exploring emergent behavior from multi-AI, multi-human interaction

$96

over 1 year ago

Act I: Exploring emergent behavior from multi-AI, multi-human interaction

$50

over 1 year ago

Lightcone Infrastructure

$95

over 1 year ago

Next Steps in Developmental Interpretability

$200

over 1 year ago

Lightcone Infrastructure

$50

over 1 year ago

Comments

Act I: Exploring emergent behavior from multi-AI, multi-human interaction

Garrett Baker

over 1 year ago

I have seen some of amp's work, and it is pretty interesting, and novel in the grand scheme of things

🧡

Lightcone Infrastructure

Garrett Baker

over 1 year ago

Lightcone consistently does quality things.

Garrett Baker salary to study the development of values of RL agents over time

Garrett Baker

almost 2 years ago

@Austin Here is the LW post: https://www.lesswrong.com/posts/Bczmi8vjiugDRec7C/what-and-why-developmental-interpretability-of-reinforcement

Transactions

For	Date	Type	Amount
AI Safety Reading Group at metauni [Retrospective]	over 1 year ago	project donation	10
Act I: Exploring emergent behavior from multi-AI, multi-human interaction	over 1 year ago	project donation	96
Act I: Exploring emergent behavior from multi-AI, multi-human interaction	over 1 year ago	project donation	50
Lightcone Infrastructure	over 1 year ago	project donation	95
<176bd26d-9db4-4c7a-98c0-ba65570fb44c>	over 1 year ago	tip	+1
Next Steps in Developmental Interpretability	over 1 year ago	project donation	200
Lightcone Infrastructure	over 1 year ago	project donation	50
Manifund Bank	over 1 year ago	deposit	+500