LeopoldAschenbrenner avatar
Leopold Aschenbrenner



Superalignment @ OpenAI

$10total balance
$10charity balance
$0cash balance

$0 in pending offers

Outgoing donations


LeopoldAschenbrenner avatar

Update Oct 24: the projects have been making exciting progress! There's more work to do, so I'm granting my remaining $200k to support it. Really excited about this!

Update from Ethan:
Some updates on how the last funding was used, and what future funding would be used for:

  1. With the help of your last grant, two of our projects finished up and turned into (IMO) pretty exciting ICLR submissions -- one on investigating whether human feedback is at fault for sycophancy in language models, and another on extending RLHF to vision-and-language models (in the hopes of facilitating process-based training of vision-and-language models like GPT4+)

  2. I've still got 3 projects in flight:

  1. For one of these, we've gotten to the point where we've found a way to improve how representative chain of thought reasoning is of the model's actual process for solving tasks, which I think will be pretty helpful for both improving model transparency and also process-based training schemes; we'll probably have a paper on this in ~2 months

  2. The other 2 projects (debate + model organisms of reward hacking) are in-flight and making good progress, and I'm optimistic that we'll have some interesting public results out in the 4 month timeframe (we already have some results that are interesting to discuss publicly, but probably want to do more work before starting to put together a paper)

  1. I might start up new projects with winter MATS or other external-to-Anthropic collaborators, all of these could benefit from funding for OAI API credits

  2. Our current runway is ~6 weeks, and we expect our compute expenses to go up a bit since we're slated to run compute-intensive experiments for the debate project

LeopoldAschenbrenner avatar

Ethan Perez is a kickass researcher whom I really respect, and he also just seems very competent at getting things done. He is mentoring these projects, and these are worthwhile empirical research directions in my opinion. The MATs scholars are probably pretty junior, so a lot of the impact might be upskilling, but Ethan also seems really bullish on the projects, which I put a lot of weight on. I'm excited to see more external empirical alignment research like this!

Ethan reached out to me a couple days ago saying they were majorly bottlenecked on compute/API credits; it seemed really high-value to unblock them, and high-value to unblock them quickly. I'm really excited that Manifund regranting exists for this purpose!

Note: I may want to give further funds to support these projects in the future; this should cover them for ~a couple months. I'm trying to see if we can get more API credits via OpenAI to cover some of this first before committing additional funding.