@NeelNanda Oh, I'd also love to hear more about the story behind "Oxford all time top alumni fundraiser", what does that actually mean, and how?
@NeelNanda
Lead of mech interp team at Google DeepMind
neelnanda.ioThis is a donation to this user's regranting budget, which is not withdrawable.
$0 in pending offers
Neel Nanda
13 days ago
@NeelNanda Oh, I'd also love to hear more about the story behind "Oxford all time top alumni fundraiser", what does that actually mean, and how?
Neel Nanda
13 days ago
Interesting project! To go well, it seems like the main project person needs to be good at a few things:
Having a good network in the grants/funding space, across a fairly diverse range of funders (given your reply to Ryan)
Plausibly, being the kind of person who could build a good network fast and has some existing connections, would suffice?
Or maybe a bunch of these funders don't care as much about personal connections, and have open applications, and you can just collect info about those?
Being good at translation: understanding the context of funders in a range of fields, their language/culture/what they look for, and understanding the AI Safety orgs and being able to sell them effectively
Do you have much evidence that shows you're good at those two things? (Also feel free to push back against this model, or point out other key skills I am missing!)
Neel Nanda
15 days ago
Ah! You mean OpenAI API costs, I thought it was a weird crypto thing. I recommend clarifying this in the post
Neel Nanda
15 days ago
What is a token and why do you need to spend a thousand dollars on them per month to make the website work?
Neel Nanda
about 1 month ago
@ms Hmm, if it's not exposed to users, DM Austin on the Discord (linked in the corner) and ask him to fix it?
Neel Nanda
about 1 month ago
Did you intentionally make the max donation $500? Your own donation has already exceeded that, so I imagine you want to raise the cap
Neel Nanda
about 1 month ago
@michaeltrazzi +1, in particular, donating money to help you spend more time on this would feel noticeably more exciting than more time for better hardware/software/etc - I don't know if your time is fungible though Gaurav?
Neel Nanda
about 1 month ago
I've found being a MATS mentor a very valuable experience! I think my scholars have done kick-ass work, several of them have had very high impact roles going forwards, and I've mentored many more people than I would have done on my own in a way that I believe has significantly magnified my and their impact, and I appreciate MATS for facilitating this.
I'm not donating more, as MATS is a large funding opportunity that I don't think EACC is well placed for, but a token donation seems in the spirit.
Neel Nanda
about 1 month ago
I'm not sure how I feel about Lightcone as an impactful donation opportunity on the margin, but I have personally benefitted a fair bit from Lightcone's work and broadly consider it to be high quality, so feel like it's in the spirit of EACC to donate!
Neel Nanda
about 1 month ago
I haven't used Teamwork myself, but I think co-working spaces are valuable, and it seems like you do good work at impressively cheap rates!
Neel Nanda
about 1 month ago
I found these posts useful, and appreciate their existence! Especially Justified Practical Advice, and credibility of the CDC
Neel Nanda
about 1 month ago
@marisavogiatzi I'd encourage you to increase the max funding significantly higher if you would spend more hours on EA stuff given more money! It's max funding, after all. I also second Austin's suggestion of charging people some money for it (specialised to what the org can reasonably afford), this is a good way to ensure you're actually providing value.
Neel Nanda
about 1 month ago
This seems like a great service to provide! And I'd expect that someone interested in EA is better placed to provide useful work for an EA org's needs than a generic professional. It's hard to judge how good you are here, but having too much demand seems like a strong positive signal, and totally worth funding.
Neel Nanda
about 1 month ago
16 out of 49 participants doing high-impact jobs seems extremely impressive (though unclear how much I'd agree with your definition of "high-impact"!) - I'd love to clarify what exactly you mean by that
Neel Nanda
about 1 month ago
I think this is a clearly useful public good, and I've found the results of the previous survey useful at random points in various minor ways
Neel Nanda
about 1 month ago
I think 80K is doing good work that should clearly be funded - I've had 80K career advising several times and found it quite valuable, and think it helped push me over the edge into pursuing AI Safety work (not donating more as I think they're far too large an org to get much value from EACC, and it's best spent on smaller orgs)
Neel Nanda
about 1 month ago
I think GWWC is doing good work, and I value there being an org carrying the torch of effective giving (not more as I think they're far too large an org to get much value from EACC, and it's best spent on smaller orgs)
Neel Nanda
about 1 month ago
Note: I currently see this on the Manifund main page:
Title: EACH / CFI Community
Summary: Growth Fund
I think it'd be good to add more detail, eg expanding "EA for Christians", since you may miss interested donors who are skimming the 60+ opportunities!
Neel Nanda
about 1 month ago
The focus on animal charities is a key detail and easy to miss, I'd recommend putting it in the title and project summary
Neel Nanda
about 1 month ago
Note: Your max funding here is $500, which I presume is an error? (If you can't fix it yourself, I imagine you can DM Austin on Discord to fix it)
Neel Nanda
about 2 months ago
@NeelNanda I was also impressed that Alex was able to defend the case for the project in quite a lot of detail, had already thought of several experiments I suggested, and generally seemed to care a lot about baselines and rigour.
I'm also generally pro supporting the work of promising junior researchers regardless of the project to help them build skills and credibility.
Neel Nanda
about 2 months ago
I discussed this with Alex Cloud. I'm somewhat pessimistic about whether the technique will both work and not have a crippling alignment tax, but he made a pretty compelling case that it MIGHT, and could be a big deal if it worked, and it's a fairly elegant idea that seems like it has potential for some cool things even if the exact proposal doesn't work.
Either way, this was a fairly cheap grant, a small fraction of the cost of labor going into the project, and it seems valuable to gather more data on whether the technique works and I expect that having more compute will make the quantity and quality of the evidence better, especially if they can go beyond using tiny stories to more realistic settings. There were several experiments Alex and I agreed would be good ideas, and I would be keen to see them happen.
Neel Nanda
about 2 months ago
I think Decode do great work, and I suggested they submit this here. I expect to fund at least part of it, and am chatting details with them.
Neel Nanda
about 2 months ago
How long a time period would the 45K requested cover? I'm very surprised you can pay 0.5 FTE and rent a decently sized office space for so little, what's the breakdown here?
And what kind of people (eg what roles/working for which orgs) work out of the space at the moment, or in the expected future?
Neel Nanda
about 2 months ago
What evidence do you have from the first iteration of the program on how well it went? (In particular, assessment of how much counterfactual value the program added - participants who go on to eg do MATS may have gotten into it anyway). Eg did you survey participants after the program/several months later? (I know it was only a few months ago, so you don't get too much data).
I looked through your application, and the website, but still don't have a great sense of this
Neel Nanda
3 months ago
I had previously discussed this grant with Lovis and suggested he apply.
Why is this a good idea?
I think Sparse Autoencoders are one of the most promising areas of mech interp work right now. Better understanding SAE circuits seems exciting, and I think that understanding the circuit required to produce a feature is an important direction. This is both a sub-part of the broader project of finding end-to-end circuits, and could help with interpreting what a feature does (especially important features like the safety relevant features in Scaling Monosemanticity) - I would be very excited if this project finds case studies of features that have ambiguous maximum activating examples, but the meaning is clarified by studying a circuit.
(Note that the applicants shared me on a more detailed project proposal than what was shared publicly, which I broadly think was sensible, though I disagreed on some points)
Concerns
Research is hard, and there's a good chance this project doesn't really go anywhere interesting
This is a hard and somewhat open-ended question, though I think they had some decent ideas of concrete entry points
There's many directions the project could go in, and it'd be easy to get caught in rabbit holes/constantly flit between things and never do any of them properly.
Why this amount?
This was the salary requested, I think somewhat pegged to academic summer researcher salaries, which are a fair bit lower than the market rate for independent researchers, so no complaints from me. The compute may not be needed, since the lab provides some, but it would be silly for the project to be bottlenecked by lacking compute. This overall seems like a fairly small grant, with some chance of going somewhere interesting, and so a pretty obvious accept.
Conflicts of interest
Lovis is one of my MATS alumni, but we haven't been working together for several months, so I don't feel too concerned about the conflict of interest, and it means I have a fair amount of data to evaluate him. I don't personally benefit from this project (except in that all good mech interp research helps my own work!), and don't anticipate being a co-author on any papers produced
Neel Nanda
4 months ago
Advocacy, R&D, and field-building seem like very different things for such a small and new org to be trying to do at once. Why did you make this decision, and how concerned are you about being spread too thin?
You also might want to add to his bio that Alexandre was second author on the Indirect Object Identification paper, which I think was great work.
Neel Nanda
5 months ago
@AdamGleave Just noting that I was quite impressed by the paper that came out of this ( https://arxiv.org/abs/2403.19647 ) - good grant, and good work by Sam, Can and co!
Neel Nanda
5 months ago
I think that SAEs are a big deal in interpretability, with lots of valuable interp work that can be unlocked with good SAEs. Developing, understanding and using SAEs is the major focus of both Anthropic's mech interp team and my team (Google DeepMind mech interp). I feel like SAE training is currently very janky and pre-paradigmatic and I would love to see progress here.
Why grant to Glen? I was particularly impressed by the ProLU work. Though it was, unfortunately, highly similar to my team's Gated SAE work, making the actual impact lower, I think ProLU was a good and principled idea that correctly identified a flaw in SAE training, and empirically showed that it was a significant improvement. Further, I think Glen broadly did the right things to show that it was an improvement, and did the leg work of training a bunch of SAEs on a range of models, layers and sites (though was bottlenecked on compute I think) and carefully comparing Pareto frontiers - this makes me more optimistic that if Glen finds an important improvement, he'll present enough evidence for me to believe him! I thought the write-up was pretty rough, but it was quite rushed, so that's not a major consideration.
We had a call, and I thought Glen was thinking about things sensibly. In particular, he had a strong emphasis on iterating fast, building the infra to try out many ideas quickly, and doubling down on any idea that meets a moderately high quality bar. I think this is a great way to do this kind of research. Another good sign is that Glen said ProLU felt less interesting to him than some of his other ideas, but had better empirical results, so was higher priority and he doubled down on it - being willing to be pragmatic like this and prioritise results makes this kind of research go much better!
Even with a grant, this kind of research is much easier to do inside a lab, where you have a lot of compute, and more engineering expertise. There are people in labs working on this, eg Anthropic has a several person sub-team on science + scaling of SAEs. But there's many problems to work on, and ultimately not many researchers working on it, and Glen seems to have many interesting ideas, so I'm not too concerned about this. There is risk of duplicate work, eg ProLU and Gated SAEs, but I don't think that's a strong enough consideration to sink the grant.
I'm generally pretty wary of people doing independent research, especially junior researchers, with concerns specifically around lacking structure, accountability, motivation, feedback/mentorship, and stability. Glen says he hasn't been experiencing any issues with executive function, which is great! I've encouraged him to look for collaborators, and ideally a mentor, which would make me feel much better about the grant. It doesn't sound like independent research is his long-term plan, which makes me feel better about this.
Glen doesn't have much of a research track record, making it hard to be confident in this going well. But he seems promising, and I think it's good to give promising, inexperienced researchers a chance to prove themselves.
I have some concerns that this grant could result in a bunch of half-baked research threads, with no public write-up or clear conclusions. But Glen seems pretty motivated to make that not happen, and I think he also has a strong incentive to produce something legible and cool to eg help with future grant/job apps
I'm honestly pretty confused about how to think about grant amounts here. $9K/month seems not crazy salary for someone living in SF, but I'd happily follow default rates for independent researchers if anyone has compiled them! $2K/month for compute seems enough to make it not a bottleneck without being too big a fraction of the grant. I'm funding this up to 5 months to balance between wanting Glen to have runway and a chance to prove himself, and wanting to see results before I recommend a larger/longer grant. If other grantmakers are excited about Glen's work I'd be happy to see them donating more though.
Glen did my MATS training program about 6 months ago. I do a lot of SAE research, and expect to benefit from better knowledge of SAE training, but in the same way that the whole community will!
Neel Nanda
5 months ago
@NeelNanda Note: Tom and I discussed this grant before he applied here, and I encouraged him to apply to Manifund since I thought it was a solid grant to fund.
Neel Nanda
5 months ago
@Austin Yep, I'd be happy to pay salary on this if Tom wants it (not sure what appropriate rates are though). Tom and I discussed it briefly before he applied.
Neel Nanda
5 months ago
I think that determining the best training setup for SAEs seems like a highly valuable thing to do. Lots of new ideas are arising about how to train these things well (eg Gated SAEs, Prolu, Anthropic's April update), with wildly varying amounts of rigour behind them, and often little effort put into replicating them and seeing how they combine. Having a rigorous and careful effort doing this seems of significant value to the mech interp community.
Tom is a strong researcher, though hasn't worked on SAEs before, I thought the Hydra Effect and Understanding AlphaZero were solid papers. Joseph is also solid and has a lot of experience with SAEs. I expect them to be a good team.
The Google DeepMind mech interp team has been looking somewhat into how to combine the Anthropic April Update methods and Gated SAEs, and also hopes to open source SAEs at some point, which creates some concerns for duplicated work. As a result, I'm less excited about significant investment into open source SAEs, though having some out (especially soon!) would be nice.
This is an engineering heavy project, and I don't know too much about Tom's engineering skills, though I don't have any reason to think they're bad.
As above, I'm less excited about significant investment into open source SAEs, which is the main reason I haven't funded the full amount. $4K is a fairly small grant, so I haven't thought too hard about exactly how much compute this should reasonably take. If the training methods exploration turns out to take much more compute than expected, I'd be happy to increase it.
Please disclose e.g. any romantic, professional, financial, housemate, or familial relationships you have with the grant recipient(s).
Tom and I somewhat overlapped at DeepMind, but never directly worked together.
Joseph is one of my MATS alumni, and currently doing my MATS extension program. I consider this more of a conflict of interest, but my understanding is that Tom is predominantly driving this project, with Joseph helping out where he can.
I expect my MATS scholars to benefit from good open source SAEs existing and for both my scholars and the GDM team to benefit from better knowledge on training SAEs, but in the same way that the whole mech interp ecosystem benefits.
Neel Nanda
9 months ago
"resulting in three publications accepted at top-tier academic ML venues (NeurIPS, ACL, ICLR),"
To add context in case people get misled by this line, the NeurIPS and ICLR papers (N2G here) were workshop papers, as far as I can tell, not main conference papers. For people not in ML, a conference like NeurIPS or ICLR has both conference papers (one of the highest status ways to publish in ML) and workshop papers (lower prestige and less selective, I'd roughly say a workshop paper is 1/3-1/2 as impressive as a conference paper).
To me, the prior is that most hackathon projects are a total flop and don't go anywhere, so helping someone convert it to a workshop paper is still impressive! (But main conference would have been very impressive). And the ACL paper was a main conference paper, which is impressive!
Neel Nanda
10 months ago
This seems pretty worth funding to me - it's a cheap grant, and I think this would be a cool paper to exist! I don't have a background in neuroscience or cognitive science, and I expect there's some techniques there worth my knowing about that would be useful for my work, but that much of it is irrelevant. I'd love for a paper surveying and summarising the most relevant ideas to exist! I've mentored Wes Gurnee and I trust his judgement/ability to represent the mech interp side, and expect Stephen Casper to also give good takes here. I don't know the rest of the organisers, but Wes vouches for their overall competence. I'd fund this myself if I had a regranting budget.
(I think a Nature publication is very ambitious, and would advise against bothering, but think an Arxiv publication is more than sufficient to make this worthwhile)
Neel Nanda
10 months ago
Lawrence is great, very experienced with alignment, and I trust his judgement, this seems like a great thing to fund! I would donate myself if this was tax deductible in the UK (which I don't think it is?)
For | Date | Type | Amount |
---|---|---|---|
80,000 Hours | 3 days ago | project donation | 50 |
Giving What We Can | 11 days ago | project donation | 50 |
Impact Accelerator Program: Biggest career program for experienced professionals | 18 days ago | project donation | 50 |
Covid Work By Elizabeth VN/Aceso Under Glass | 20 days ago | project donation | 50 |
Social Media Strategy for EA Orgs | 20 days ago | project donation | 50 |
LEAH Coworking Space | 20 days ago | project donation | 50 |
MATS Program | about 1 month ago | project donation | 50 |
Lightcone Infrastructure | about 1 month ago | project donation | 50 |
Teamwork - professional EA co-working space in Berlin | about 1 month ago | project donation | 100 |
Manifund Bank | about 1 month ago | deposit | +600 |
Compute for 4 MATS scholars to rapidly scale promising new method pre-ICLR | about 1 month ago | project donation | 16047 |
Understanding SAE features using Sparse Feature Circuits | 3 months ago | project donation | 11000 |
Independent research to improve SAEs (4-6 months) | 5 months ago | project donation | 55000 |
Train great open-source sparse autoencoders | 5 months ago | project donation | 4000 |
Manifund Bank | 6 months ago | deposit | +250000 |