Evan Hubinger

@evhub

regrantor

AGI safety Research Scientist at Anthropic. Previously Research Fellow at Machine Intelligence Research Institute.

https://www.alignmentforum.org/users/evhub

Donate

This is a donation to this user's regranting budget, which is not withdrawable.

$15,085total balance

$15,085charity balance

$0cash balance

$0 in pending offers

About Me

All of my grant-making will go towards reducing existential risks from artificial intelligence. Some reasons to think I'll do a good job at that:

I have a good deal of grant-making experience, having previously served as a Fund Manager for the EA Long-Term Future Fund (example grant write-ups: https://forum.effectivealtruism.org/posts/HYKDh2mLjapsgj9nB/long-term-future-fund-july-2021-grant-recommendations#Grant_reports_by_Evan_Hubinger and https://forum.effectivealtruism.org/posts/ddBLtdQjcjvZH5JvF/long-term-future-fund-december-2021-grant-recommendations#Grants_evaluated_by_Evan_Hubinger) as well as an FTX Future Fund regrantor.
I also have a great deal of professional AGI safety research experience, including both very theoretical work (e.g. "Risks from Learned Optimization" (https://arxiv.org/abs/1906.01820) and other stuff I did at MIRI) and very empirical work (e.g. "Discovering Language Model Behaviors" (https://arxiv.org/abs/2212.09251) and other stuff I'm currently doing at Anthropic). As a result, I think I'm particularly well-placed to evaluate AI safety projects across the full spectrum of possible approaches.
I do a lot of mentorship (e.g. via the SERI MATS program I helped start (https://www.alignmentforum.org/posts/FpokmCnbP3CEZ5h4t/ml-alignment-theory-program-under-evan-hubinger)) and as a result I see a lot of early-stage researchers and projects that would often benefit greatly from additional funding but are too illegible for traditional funders.

Outgoing donations

Lightcone Infrastructure

Apollo Research: Scale up interpretability & behavioral model evals research

$10000

3 months ago

Next Steps in Developmental Interpretability

$30000

7 months ago

AI-Driven Market Alternatives for a post-AGI world

$5000

7 months ago

Support for Deep Coverage of China and AI

$20000

7 months ago

Evaluating the Effectiveness of Unlearning Techniques

$20000

9 months ago

AI Policy work @ IAPS

$5000

10 months ago

Research Staff for AI Safety Research Projects

Long-Term Future Fund

$100000

about 1 year ago

Athena - New Program for Women in AI Alignment Research

Apollo Research: Scale up interpretability & behavioral model evals research

$15000

over 1 year ago

Scaling Training Process Transparency

$5000

over 1 year ago

Exploring novel research directions in prosaic AI alignment

$25000

over 1 year ago

Medical Expenses for CHAI PhD Student

$10000

over 1 year ago

Avoiding Incentives for Performative Prediction in AI

$33000

over 1 year ago

Apollo Research: Scale up interpretability & behavioral model evals research

$100000

over 1 year ago

Scoping Developmental Interpretability

$100000

over 1 year ago

Activation vector steering with BCI

$15000

over 1 year ago

Agency and (Dis)Empowerment

$60000

over 1 year ago

Comments

Exploring novel research directions in prosaic AI alignment

Evan Hubinger

over 1 year ago

Main points in favor of this grant

Normally I'm somewhat skeptical of totally independent alignment work, but Lawrence has a solid track record and I think his project ideas sound quite exciting. I was also recommended this grant specifically by someone I trust, and encouraged Lawrence to put it up here.

Donor's main reservations

Independent alignment work without any mentorship doesn't have a fantastic track record in my opinion, so it's definitely possible that not much of value will come from this other than helping keep Lawrence learning and doing work (though that is still meaningful upside).

Process for deciding amount

I would fund the full amount here, but I'm starting to run out of money in my Manifund pot. I'd appreciate other funders stepping in to top this off.

Conflicts of interest

None.

Scaling Training Process Transparency

Evan Hubinger

over 1 year ago

Main points in favor of this grant

I am excited about more work in the realm of training transparency, and I know that Rob is capable of executing here from having mentored him previously.

Donor's main reservations

The main way I could imagine this being a bad idea is if it's not a good use of Rob's time, but I'll defer to his judgement there.

Process for deciding amount

I'd likely be willing to fund more than 5k, but I'll cover the full expenses being requested for now.

Conflicts of interest

Rob was a mentee of mine in the SERI MATS program.

Medical Expenses for CHAI PhD Student

Evan Hubinger

over 1 year ago

Main points in favor of this grant

I don't have a strong take on how good Rachel's current research is, but she's clearly doing relevant work and it seems high-impact to cover medical expenses to let her keep doing that if doing so is cheap.

Donor's main reservations

I am more confident that covering medical expenses is good than I am that other timesaving services such as a PA will be good.

Process for deciding amount

I have committed $10k for now, but would be willing to commit more if @rachelfreedman identifies to me that the current amount is insufficient to cover her medical expenses. Based on current committed funding, it looks to me like she may or may not have enough to do that depending on how much buffer is necessary. If the current amount is insufficient, I would likely be willing to put in more.

Conflicts of interest

None.

Long-Term Future Fund

Evan Hubinger

over 1 year ago

Main points in favor of this grant

I have been consistently impressed by the LTFF's grantmaking and this seems to be a time when they are uniquely in need of funding (https://www.lesswrong.com/posts/gRfy2Q2Pg25a2cHyY/ltff-and-eaif-are-unusually-funding-constrained-right-now).

Donor's main reservations

I think my main reservation here is that it sort of defeats the purpose of regranting, since now the funding is just flowing to the existing grantmaking institution of the LTFF rather than the regrantor mechanism. But, while I do like the regrantor mechanism, I think that in this case the LTFF's funding constraints justify this grant.

Process for deciding amount

I want to have enough left in my pot to fund any really good opportunities that might come up, but otherwise I'm committing the rest of my pot to this.

Conflicts of interest

I used to be a fund manager for the LTFF.

Avoiding Incentives for Performative Prediction in AI

Evan Hubinger

over 1 year ago

Main points in favor of this grant

I am excited about more work along the lines of the existing "Incentivizing honest performative predictions with proper scoring rules" paper. I think that there are serious safety problems surrounding predictors that select their predictions to influence the world in such a way as to make those predictions true ("self-fulfilling prophecies") and I am excited about this work as a way to discover mechanisms for dealing with those sorts of problems. "Conditioning Predictive Models" discusses these sorts of issues in more detail. Rubi is a great person to work on this as he was an author on both of those papers.

Donor's main reservations

I think my main reservations here are just around Rubi's opportunity costs, though I think this is reasonably exciting work and I trust Rubi to make a good judgement about what he should be spending his time working on. The most likely failure mode here would probably be that the additional work here doesn't turn up anything else new or interesting that wasn't already surfaced in the "Incentivizing honest performative predictions with proper scoring rules" paper.

Process for deciding amount

I think that $33k is a reasonable amount given the timeframe and work.

Conflicts of interest

Rubi was a previous mentee of mine in SERI MATS and a coauthor of mine on "Conditioning Predictive Models."

Apollo Research: Scale up interpretability & behavioral model evals research

Evan Hubinger

over 1 year ago

Main points in favor of this grant

I am quite excited about deception evaluations (https://www.lesswrong.com/posts/Km9sHjHTsBdbgwKyi/monitoring-for-deceptive-alignment), transparency and interpretability (https://www.lesswrong.com/posts/nbq2bWLcYmSGup9aF/a-transparency-and-interpretability-tech-tree), and especially the combination of the two (https://www.lesswrong.com/posts/uqAdqrvxqGqeBHjTP/towards-understanding-based-safety-evaluations). If I were crafting my ideal agenda for a new alignment org, it would be pretty close to what Apollo has settled on. Additionally, I mentored Marius, who's one of the co-founders, and I have confidence that he understands what needs to be done for the agenda they're tackling and has the competence to give it a real attempt. I've also met Lee and feel similarly about him.

Donor's main reservations

My main reservations are:

It's plausible that Apollo is scaling too quickly. I don't know exactly how many people they've hired so far or plan to hire, but I do think that they should be careful not to overextend themselves and expand too rapidly. I do want Apollo to be well-funded, but I am somewhat wary of that resulting in them expanding their headcount too quickly.
As Apollo is a small lab, it might be quite difficult for them to get access to state-of-the-art models, which I think would be likely to slow down their agenda substantially. I'd be worried especially if Apollo was trading off against people going to work directly on safety at large labs (OAI, Anthropic, GDM) where large model access is more available. Though this would also be mitigated substantially if Apollo was able to find a way to work with labs to get approval to use their models for research purposes externally, and I do not know if that will happen or not.

Process for deciding amount

I decided on my $100k amount in conjunction with Tristan Hume, so that we would be together granting $300k. Both of us were excited about Apollo, but Tristan was more relatively excited about Apollo compared to other grants, so he decided to go in for the larger amount. I think $300k is a reasonable amount for Apollo to be able to spin up initial operations, ideally in conjunction with support from other funders as well.

Conflicts of interest

Marius was a mentee of mine in the SERI MATS program.

Scoping Developmental Interpretability

Evan Hubinger

over 1 year ago

Main points in favor of this grant

I think understanding the inductive biases of modern machine learning processes is extremely important both to being able to accurately asses dangers as well as discover good interventions. Most of my uncertainty over the future is currently tied up in uncertainty regarding the inductive biases of machine learning processes (see here for a good explanation of why: https://www.alignmentforum.org/posts/A9NxPTwbw6r6Awuwt/how-likely-is-deceptive-alignment).

On that front, I think Singular Learning Theory is a real contender for a theory that has a chance of effectively explaining and predicting the mechanisms behind machine learning inductive biases. Furthermore, I'm familiar with the work of many of the people involved in this project and I believe them to be quite capable at tackling this problem.

Donor's main reservations

Though I have high hopes for Singular Learning Theory, my modal outcome is that it's mostly just wrong and doesn't explain machine learning inductive biases that well. Inductive biases are very complex and most theories like this in the past have failed. Though I think this is a better bet here than most, I don't think I expect it to succeed.

Process for deciding amount

I've committed $100k at this time, which I think is a reasonable amount for the team to, hopefully in combination with funding from other sources, get started spinning up on this project.

Conflicts of interest

Jesse was a mentee of mine in the SERI MATS program.

Agency and (Dis)Empowerment

Evan Hubinger

over 1 year ago

Main points in favor of this grant

Generally, my policy for funding independent research is that I look for the presence of a mentor with a solid research track record that will be overseeing the research. I think completely independent research is rarely a good idea for junior researchers, but if a more senior researcher is involved to guide the project and provide feedback, then I think it tends to go quite well. In this case, there will be a number of senior researchers such as Tom Everitt and Victoria Krakovna overseeing the project, which makes me feel quite good about it.

Donor's main reservations

I have some reservations about the utility of mathematical formalizations of agency, as I think it's somewhat unclear how useful such a formalization actually would be or what we would do with it. That being said, I don't see much downside risk, and I certainly think there are some cases where it could be quite useful, such as for constructing good evaluations for agency in models.

Process for deciding amount

I am recommending the amount that Damiano requested as I think it is a reasonable amount given his breakdown.

Conflicts of interest

None.

Transactions

For	Date	Type	Amount
<584b961e-134c-47aa-895f-350d8524f14c>	12 days ago	profile donation	+25
Lightcone Infrastructure	3 months ago	project donation	10000
MATS Program	3 months ago	project donation	10000
Apollo Research: Scale up interpretability & behavioral model evals research	3 months ago	project donation	10000
Next Steps in Developmental Interpretability	7 months ago	project donation	30000
AI-Driven Market Alternatives for a post-AGI world	7 months ago	project donation	5000
Support for Deep Coverage of China and AI	7 months ago	project donation	20000
Evaluating the Effectiveness of Unlearning Techniques	9 months ago	project donation	20000
AI Policy work @ IAPS	10 months ago	project donation	5000
Research Staff for AI Safety Research Projects	10 months ago	project donation	25000
<a2e90f73-2e2b-4059-9e09-eb1000bc572e>	10 months ago	profile donation	+50
MATS Program	11 months ago	project donation	80000
Manifund Bank	12 months ago	deposit	+230000
<d4c24a4d-b393-4671-aae0-e6883fd0bc37>	about 1 year ago	profile donation	+10
Long-Term Future Fund	about 1 year ago	project donation	100000
Athena - New Program for Women in AI Alignment Research	about 1 year ago	project donation	20000
MATS Program	over 1 year ago	project donation	17533
Apollo Research: Scale up interpretability & behavioral model evals research	over 1 year ago	project donation	15000
<02be5f43-1129-4025-b752-8127a793fd82>	over 1 year ago	profile donation	+333
Scaling Training Process Transparency	over 1 year ago	project donation	5000
Manifund Bank	over 1 year ago	deposit	+50000
Exploring novel research directions in prosaic AI alignment	over 1 year ago	project donation	25000
Medical Expenses for CHAI PhD Student	over 1 year ago	project donation	10000
<8c5d3152-ffd8-4d0e-b447-95a31f51f9d3>	over 1 year ago	profile donation	+100
Avoiding Incentives for Performative Prediction in AI	over 1 year ago	project donation	33000
Apollo Research: Scale up interpretability & behavioral model evals research	over 1 year ago	project donation	100000
<d950592c-b002-4a71-8235-b92b66ab30ef>	over 1 year ago	profile donation	+100
Scoping Developmental Interpretability	over 1 year ago	project donation	100000
Manifund Bank	over 1 year ago	deposit	+50000
Activation vector steering with BCI	over 1 year ago	project donation	15000
Agency and (Dis)Empowerment	over 1 year ago	project donation	60000
Manifund Bank	almost 2 years ago	deposit	+400000