evhub avatar
Evan Hubinger

@evhub

regrantor

AGI safety Research Scientist at Anthropic. Previously Research Fellow at Machine Intelligence Research Institute.

https://www.alignmentforum.org/users/evhub

Donate

This is a donation to this user's regranting budget, which is not withdrawable.

Sign in to donate
$15,060total balance
$3,560charity balance
$0cash balance

$11,500 in pending offers

About Me

All of my grant-making will go towards reducing existential risks from artificial intelligence. Some reasons to think I'll do a good job at that:

Outgoing donations

Comments

evhub avatar

Evan Hubinger

about 1 year ago

Main points in favor of this grant


Normally I'm somewhat skeptical of totally independent alignment work, but Lawrence has a solid track record and I think his project ideas sound quite exciting. I was also recommended this grant specifically by someone I trust, and encouraged Lawrence to put it up here.

Donor's main reservations


Independent alignment work without any mentorship doesn't have a fantastic track record in my opinion, so it's definitely possible that not much of value will come from this other than helping keep Lawrence learning and doing work (though that is still meaningful upside).

Process for deciding amount


I would fund the full amount here, but I'm starting to run out of money in my Manifund pot. I'd appreciate other funders stepping in to top this off.

Conflicts of interest

None.

evhub avatar

Evan Hubinger

about 1 year ago

Main points in favor of this grant


I am excited about more work in the realm of training transparency, and I know that Rob is capable of executing here from having mentored him previously.

Donor's main reservations


The main way I could imagine this being a bad idea is if it's not a good use of Rob's time, but I'll defer to his judgement there.

Process for deciding amount


I'd likely be willing to fund more than 5k, but I'll cover the full expenses being requested for now.

Conflicts of interest

Rob was a mentee of mine in the SERI MATS program.

evhub avatar

Evan Hubinger

about 1 year ago

Main points in favor of this grant

I don't have a strong take on how good Rachel's current research is, but she's clearly doing relevant work and it seems high-impact to cover medical expenses to let her keep doing that if doing so is cheap.

Donor's main reservations

I am more confident that covering medical expenses is good than I am that other timesaving services such as a PA will be good.

Process for deciding amount


I have committed $10k for now, but would be willing to commit more if @rachelfreedman identifies to me that the current amount is insufficient to cover her medical expenses. Based on current committed funding, it looks to me like she may or may not have enough to do that depending on how much buffer is necessary. If the current amount is insufficient, I would likely be willing to put in more.

Conflicts of interest

None.

evhub avatar

Evan Hubinger

over 1 year ago

Main points in favor of this grant

I have been consistently impressed by the LTFF's grantmaking and this seems to be a time when they are uniquely in need of funding (https://www.lesswrong.com/posts/gRfy2Q2Pg25a2cHyY/ltff-and-eaif-are-unusually-funding-constrained-right-now).

Donor's main reservations

I think my main reservation here is that it sort of defeats the purpose of regranting, since now the funding is just flowing to the existing grantmaking institution of the LTFF rather than the regrantor mechanism. But, while I do like the regrantor mechanism, I think that in this case the LTFF's funding constraints justify this grant.

Process for deciding amount


I want to have enough left in my pot to fund any really good opportunities that might come up, but otherwise I'm committing the rest of my pot to this.

Conflicts of interest

I used to be a fund manager for the LTFF.

evhub avatar

Evan Hubinger

over 1 year ago

Main points in favor of this grant

I am excited about more work along the lines of the existing "Incentivizing honest performative predictions with proper scoring rules" paper. I think that there are serious safety problems surrounding predictors that select their predictions to influence the world in such a way as to make those predictions true ("self-fulfilling prophecies") and I am excited about this work as a way to discover mechanisms for dealing with those sorts of problems. "Conditioning Predictive Models" discusses these sorts of issues in more detail. Rubi is a great person to work on this as he was an author on both of those papers.

Donor's main reservations


I think my main reservations here are just around Rubi's opportunity costs, though I think this is reasonably exciting work and I trust Rubi to make a good judgement about what he should be spending his time working on. The most likely failure mode here would probably be that the additional work here doesn't turn up anything else new or interesting that wasn't already surfaced in the "Incentivizing honest performative predictions with proper scoring rules" paper.

Process for deciding amount


I think that $33k is a reasonable amount given the timeframe and work.

Conflicts of interest

Rubi was a previous mentee of mine in SERI MATS and a coauthor of mine on "Conditioning Predictive Models."

evhub avatar

Evan Hubinger

over 1 year ago

Main points in favor of this grant


I am quite excited about deception evaluations (https://www.lesswrong.com/posts/Km9sHjHTsBdbgwKyi/monitoring-for-deceptive-alignment), transparency and interpretability (https://www.lesswrong.com/posts/nbq2bWLcYmSGup9aF/a-transparency-and-interpretability-tech-tree), and especially the combination of the two (https://www.lesswrong.com/posts/uqAdqrvxqGqeBHjTP/towards-understanding-based-safety-evaluations). If I were crafting my ideal agenda for a new alignment org, it would be pretty close to what Apollo has settled on. Additionally, I mentored Marius, who's one of the co-founders, and I have confidence that he understands what needs to be done for the agenda they're tackling and has the competence to give it a real attempt. I've also met Lee and feel similarly about him.

Donor's main reservations


My main reservations are:

  1. It's plausible that Apollo is scaling too quickly. I don't know exactly how many people they've hired so far or plan to hire, but I do think that they should be careful not to overextend themselves and expand too rapidly. I do want Apollo to be well-funded, but I am somewhat wary of that resulting in them expanding their headcount too quickly.

  2. As Apollo is a small lab, it might be quite difficult for them to get access to state-of-the-art models, which I think would be likely to slow down their agenda substantially. I'd be worried especially if Apollo was trading off against people going to work directly on safety at large labs (OAI, Anthropic, GDM) where large model access is more available. Though this would also be mitigated substantially if Apollo was able to find a way to work with labs to get approval to use their models for research purposes externally, and I do not know if that will happen or not.

Process for deciding amount


I decided on my $100k amount in conjunction with Tristan Hume, so that we would be together granting $300k. Both of us were excited about Apollo, but Tristan was more relatively excited about Apollo compared to other grants, so he decided to go in for the larger amount. I think $300k is a reasonable amount for Apollo to be able to spin up initial operations, ideally in conjunction with support from other funders as well.

Conflicts of interest

Marius was a mentee of mine in the SERI MATS program.

evhub avatar

Evan Hubinger

over 1 year ago

Main points in favor of this grant


I think understanding the inductive biases of modern machine learning processes is extremely important both to being able to accurately asses dangers as well as discover good interventions. Most of my uncertainty over the future is currently tied up in uncertainty regarding the inductive biases of machine learning processes (see here for a good explanation of why: https://www.alignmentforum.org/posts/A9NxPTwbw6r6Awuwt/how-likely-is-deceptive-alignment).

On that front, I think Singular Learning Theory is a real contender for a theory that has a chance of effectively explaining and predicting the mechanisms behind machine learning inductive biases. Furthermore, I'm familiar with the work of many of the people involved in this project and I believe them to be quite capable at tackling this problem.

Donor's main reservations


Though I have high hopes for Singular Learning Theory, my modal outcome is that it's mostly just wrong and doesn't explain machine learning inductive biases that well. Inductive biases are very complex and most theories like this in the past have failed. Though I think this is a better bet here than most, I don't think I expect it to succeed.

Process for deciding amount


I've committed $100k at this time, which I think is a reasonable amount for the team to, hopefully in combination with funding from other sources, get started spinning up on this project.

Conflicts of interest

Jesse was a mentee of mine in the SERI MATS program.

evhub avatar

Evan Hubinger

over 1 year ago

Main points in favor of this grant


Generally, my policy for funding independent research is that I look for the presence of a mentor with a solid research track record that will be overseeing the research. I think completely independent research is rarely a good idea for junior researchers, but if a more senior researcher is involved to guide the project and provide feedback, then I think it tends to go quite well. In this case, there will be a number of senior researchers such as Tom Everitt and Victoria Krakovna overseeing the project, which makes me feel quite good about it.

Donor's main reservations


I have some reservations about the utility of mathematical formalizations of agency, as I think it's somewhat unclear how useful such a formalization actually would be or what we would do with it. That being said, I don't see much downside risk, and I certainly think there are some cases where it could be quite useful, such as for constructing good evaluations for agency in models.

Process for deciding amount


I am recommending the amount that Damiano requested as I think it is a reasonable amount given his breakdown.

Conflicts of interest

None.

Transactions

ForDateTypeAmount
Lightcone Infrastructure7 days agoproject donation10000
MATS Program7 days agoproject donation10000
Apollo Research: Scale up interpretability & behavioral model evals research7 days agoproject donation10000
Next Steps in Developmental Interpretability4 months agoproject donation30000
AI-Driven Market Alternatives for a post-AGI world4 months agoproject donation5000
Support for Deep Coverage of China and AI4 months agoproject donation20000
Evaluating the Effectiveness of Unlearning Techniques 6 months agoproject donation20000
AI Policy work @ IAPS7 months agoproject donation5000
Research Staff for AI Safety Research Projects7 months agoproject donation25000
<a2e90f73-2e2b-4059-9e09-eb1000bc572e>7 months agoprofile donation+50
MATS Program8 months agoproject donation80000
Manifund Bank9 months agodeposit+230000
<d4c24a4d-b393-4671-aae0-e6883fd0bc37>10 months agoprofile donation+10
Long-Term Future Fund12 months agoproject donation100000
Athena - New Program for Women in AI Alignment Research12 months agoproject donation20000
MATS Programabout 1 year agoproject donation17533
Apollo Research: Scale up interpretability & behavioral model evals researchabout 1 year agoproject donation15000
<02be5f43-1129-4025-b752-8127a793fd82>about 1 year agoprofile donation+333
Scaling Training Process Transparencyabout 1 year agoproject donation5000
Manifund Bankabout 1 year agodeposit+50000
Exploring novel research directions in prosaic AI alignmentabout 1 year agoproject donation25000
Medical Expenses for CHAI PhD Studentabout 1 year agoproject donation10000
<8c5d3152-ffd8-4d0e-b447-95a31f51f9d3>about 1 year agoprofile donation+100
Avoiding Incentives for Performative Prediction in AIover 1 year agoproject donation33000
Apollo Research: Scale up interpretability & behavioral model evals researchover 1 year agoproject donation100000
<d950592c-b002-4a71-8235-b92b66ab30ef>over 1 year agoprofile donation+100
Scoping Developmental Interpretabilityover 1 year agoproject donation100000
Manifund Bankover 1 year agodeposit+50000
Activation vector steering with BCIover 1 year agoproject donation15000
Agency and (Dis)Empowermentover 1 year agoproject donation60000
Manifund Bankover 1 year agodeposit+400000