If I didn't procrastinate on applying to get money from Manifund for the community choice thing, I would have donated some amount of money to keep this alive. I think the request is small enough that it seems useful to fund this on the margin.

The AI Governance Archive (TAIGA)

Gaurav Yadav

over 1 year ago

@ZachSteinPerlman I do think before TAIGA paused operations, it seemed useful to find new ideas and projects. There are a bunch of new/speculative ideas on there that were helpful to me at the time.

AI Governance YouTube Channel

Gaurav Yadav

over 1 year ago

@manuelallgaier Hi Manuel, I don't know any other creators. I could have a look thank you for the suggestion!

AI Governance YouTube Channel

Gaurav Yadav

over 1 year ago

@NeelNanda Hi Neel, I've taken this feedback. My time is somewhat fungible. I expect to spend only part of my time on this (25-30 hours). I've updated the project to prioritize using part of the funds here to pay for my time. I think there are only things I would end up paying for, an editing software and maybe a better microphone. I don't expect to spend money on any other software/hardware. It seems to me that whatever I have is a viable setup for now.

AI Governance YouTube Channel

Gaurav Yadav

over 1 year ago

@michaeltrazzi Hi Michaël, I've updated the project to include more detail. Let me know what you think.

AI Governance YouTube Channel

Gaurav Yadav

over 1 year ago

@michaeltrazzi Hey Michaël, thanks. I appreciate you sharing your reservations. I am quite time constrained at the moment so didn't put in a whole lot of effort to this application, but it now seems useful to do that. I hope to write a clarified version of this in a day or two.

Exploring feature interactions in transformer LLMs through sparse autoencoders

Gaurav Yadav

over 2 years ago

If you haven't been, there's a lot of discussion about this on ElutherAI.

WhiteBox Research: Training Exclusively for Mechanistic Interpretability

Gaurav Yadav

over 2 years ago

@briantan Hi Brian, don't have more thoughts or questions at the moment, but thanks for the thoughtful reply, these seem good!

WhiteBox Research: Training Exclusively for Mechanistic Interpretability

Gaurav Yadav

over 2 years ago

*This was written very quickly, and I may not agree with what I'm saying later on!

Here are some questions and thoughts - I can't commit to funding at the moment, but I would like to share my thoughts.

Having spent roughly 1-1.5 years community building and observing Brian quite active on the EA Groups Slack and through email communications, I'm left with the impression that Brian is quite agentic. I hold a high prior on the plans of this proposal being executed if funded. Thus, I can see plans being made and things being carried out.

I also hold some confidence that establishing another hub might be beneficial, although I'm not entirely sure how to reconcile this with the idea that those interested in working on alignment might derive more value from visiting Berkeley than going to a new hub.

A few concerns do arise, however. The proposal mentions research sprints to solve the COPs, and while this approach seems suitable for less time-intensive tasks, I question its overall efficacy (45% sure this is true). I believe that rushing things or working on them quickly might not be the most conducive to learning.

Regarding the statement 'Due to being highly neglected,' I'm under the impression (60% sure) that interpretability is slightly saturated at the moment, contrary to the assertion that it's heavily neglected.

My final concern is about mentorship. It appears that only one person on the team has formal mentorship or experience in MI. This is concerning, particularly if you're planning on onboarding 10-15 people, as having one person mentoring them all is going to be challenging. More mentorship (and more experienced mentorship) might be necessary to identify and correct problems early and prevent suboptimal strategies from being implemented.

Holly Elmore organizing people for a frontier AI moratorium

Gaurav Yadav

over 2 years ago

I am making a bet (though a very small one) that this ends up having a positive EV. I’ve spent more time thinking about the role advocacy can play in pushing timelines away, and I’d place a 60% chance (medium error bars at the moment) that Holly’s efforts to try and push for regulatory measures through advocacy will end up buying more time for alignment researchers.

Currently, I am fairly optimistic that this work can get us to a ‘Risk Awareness Moment’ (https://forum.effectivealtruism.org/posts/L8GjzvRYA9g9ox2nP/prospects-for-ai-safety-agreements-between-countries) such that pushing now for regulations ends up working out really well.

"My desired impact on the world is to shift the Overton window in favour of a moratorium, reframing the issue from 'do we have the right to interfere with AI progress?' to 'AI labs have to show us their product will be safe before we allow them to continue.'"
- This seems to me like a good reframing. Though I am unsure why or how the framing currently is that we can’t interfere with AI progress. Regardless, I think trying to get labs to show us a level of interpretability before they can continue seems good!

A few reasons why moratorium or advocacy efforts might end up being negative EV (this is more a comment on the idea of a moratorium itself and not on Holly):

Efforts to regulate labs could end up accelerating timelines. I don’t know how feasible this actually is, but in my mind, it goes something like: "Oh, they’re trying to regulate us; better speed up progress to TAI so we can reap the benefits."
There might not be enough interest within Congress to change things on AI, or they might end up crafting policies that don’t actually tackle the x-risks parts from AI. In some fashion, this ended up happening with the EU AI Act in its early days, if I remember correctly. I must note that I have very little context or understanding of how the US system works

I think this proposal lacks specific proposals or laws that might get pushed for. Are we thinking of compute regulations like ‘What does it take to catch a Chinchilla?’ Are we thinking of having laws in place that allow audits or inspections to take place?

Transactions

For	Date	Type	Amount
AI forecasting and policy research by the AI 2027 team	10 months ago	project donation	30
Gaurav Yadav	10 months ago	cash to charity transfer	30
AI Governance YouTube Channel	11 months ago	project donation	+30
Manifund Bank	about 1 year ago	withdraw	145
AI Governance YouTube Channel	over 1 year ago	project donation	+45
AI Governance YouTube Channel	over 1 year ago	project donation	+100
Manifund Bank	over 1 year ago	withdraw	557
AI Governance YouTube Channel	over 1 year ago	project donation	+441
AI Governance YouTube Channel	over 1 year ago	project donation	+40
AI Governance YouTube Channel	over 1 year ago	project donation	+10
AI Governance YouTube Channel	over 1 year ago	project donation	+10
AI Governance YouTube Channel	over 1 year ago	project donation	+56
Holly Elmore organizing people for a frontier AI moratorium	over 2 years ago	project donation	10
Manifund Bank	over 2 years ago	deposit	+10