GauravYadav avatar
Gaurav Yadav

@GauravYadav

bristolaisafety.org
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

Outgoing donations

Comments

GauravYadav avatar

Gaurav Yadav

3 months ago

If you haven't been, there's a lot of discussion about this on ElutherAI.

GauravYadav avatar

Gaurav Yadav

5 months ago

Hmm - I’m not sure about that assessment. I personally don’t think there’s a lot of high quality submissions; yours compared to a lot does seem quite good IMO. Sure it’s not insignificant but I’m surprised to not see even a bit of funding commitment from any of the regrantors up until now. It could be that there’s a lot of internal discussion happening.

GauravYadav avatar

Gaurav Yadav

5 months ago

Hmm, I am quite surprised this hasn't been funded. Maybe I am missing out on something but these ideas seem pretty good at first glance.

GauravYadav avatar

Gaurav Yadav

6 months ago

@briantan Hi Brian, don't have more thoughts or questions at the moment, but thanks for the thoughtful reply, these seem good!

GauravYadav avatar

Gaurav Yadav

6 months ago

*This was written very quickly, and I may not agree with what I'm saying later on!

Here are some questions and thoughts - I can't commit to funding at the moment, but I would like to share my thoughts.

Having spent roughly 1-1.5 years community building and observing Brian quite active on the EA Groups Slack and through email communications, I'm left with the impression that Brian is quite agentic. I hold a high prior on the plans of this proposal being executed if funded. Thus, I can see plans being made and things being carried out.

I also hold some confidence that establishing another hub might be beneficial, although I'm not entirely sure how to reconcile this with the idea that those interested in working on alignment might derive more value from visiting Berkeley than going to a new hub.

A few concerns do arise, however. The proposal mentions research sprints to solve the COPs, and while this approach seems suitable for less time-intensive tasks, I question its overall efficacy (45% sure this is true). I believe that rushing things or working on them quickly might not be the most conducive to learning.

Regarding the statement 'Due to being highly neglected,' I'm under the impression (60% sure) that interpretability is slightly saturated at the moment, contrary to the assertion that it's heavily neglected.

My final concern is about mentorship. It appears that only one person on the team has formal mentorship or experience in MI. This is concerning, particularly if you're planning on onboarding 10-15 people, as having one person mentoring them all is going to be challenging. More mentorship (and more experienced mentorship) might be necessary to identify and correct problems early and prevent suboptimal strategies from being implemented.

GauravYadav avatar

Gaurav Yadav

6 months ago

I am making a bet (though a very small one) that this ends up having a positive EV. I’ve spent more time thinking about the role advocacy can play in pushing timelines away, and I’d place a 60% chance (medium error bars at the moment) that Holly’s efforts to try and push for regulatory measures through advocacy will end up buying more time for alignment researchers.

Currently, I am fairly optimistic that this work can get us to a ‘Risk Awareness Moment’ (https://forum.effectivealtruism.org/posts/L8GjzvRYA9g9ox2nP/prospects-for-ai-safety-agreements-between-countries) such that pushing now for regulations ends up working out really well.

"My desired impact on the world is to shift the Overton window in favour of a moratorium, reframing the issue from 'do we have the right to interfere with AI progress?' to 'AI labs have to show us their product will be safe before we allow them to continue.'"
- This seems to me like a good reframing. Though I am unsure why or how the framing currently is that we can’t interfere with AI progress. Regardless, I think trying to get labs to show us a level of interpretability before they can continue seems good!

A few reasons why moratorium or advocacy efforts might end up being negative EV (this is more a comment on the idea of a moratorium itself and not on Holly):

  • Efforts to regulate labs could end up accelerating timelines. I don’t know how feasible this actually is, but in my mind, it goes something like: "Oh, they’re trying to regulate us; better speed up progress to TAI so we can reap the benefits."

  • There might not be enough interest within Congress to change things on AI, or they might end up crafting policies that don’t actually tackle the x-risks parts from AI. In some fashion, this ended up happening with the EU AI Act in its early days, if I remember correctly. I must note that I have very little context or understanding of how the US system works

I think this proposal lacks specific proposals or laws that might get pushed for. Are we thinking of compute regulations like ‘What does it take to catch a Chinchilla?’ Are we thinking of having laws in place that allow audits or inspections to take place?