trishume avatar
Tristan Hume

@trishume

regrantor

Interpretability at Anthropic

https://thume.ca/
$50,000total balance
$50,000charity balance
$0cash balance

$0 in pending offers

About Me

Developer and interpretability researcher at Anthropic

Outgoing donations

Comments

trishume avatar

Tristan Hume

3 months ago

I've been very impressed with the MATS program. Lots of impressive people have gotten into and connected through their program and when I've visited I've been impressed with the caliber of people I met.

An example is Marius Hobbhahn doing interpretability research during MATS that helped inform the Anthropic interpretability team's strategy, and then Marius going on to co-found Apollo.

trishume avatar

Tristan Hume

7 months ago

I'm very excited about Apollo based on a combination of the track record of it's founding employees and the research agenda they've articulated.

[Marius](https://www.alignmentforum.org/posts/KzwB4ovzrZ8DYWgpw/more-findings-on-memorization-and-double-descent) and [Lee](https://www.alignmentforum.org/posts/z6QQJbtpkEAX3Aojj/interim-research-report-taking-features-out-of-superposition) have published work that's [significantly contributed to Anthropic's work on dictionary learning](https://transformer-circuits.pub/2023/may-update/index.html). I've also met both Marius and Lee and have confidence in them to do a good job with Apollo.

Additionally, I'm very much a fan of alignment and dangerous capability evals as an area of research and think there's lots of room for more people to work on them.

In terms of cost-effectiveness I like these research areas because they're ones I think are very tractable to approach from outside a major lab in a helpful way, while not taking large amounts of compute. I also think Apollo existing in London will allow them to hire underutilized talent that would have trouble getting a U.S. visa.