Shallow review of AI safety 2024

Gavin Leech

CompleteGrant

$20,860raised

Project summary

Last year I and a collaborator summarised every live project in AI safety, tried to understand their theories of change, listed outputs, personnel, and funding amounts, and wrote an editorial.

We talked to a couple dozen researchers to check our glosses and get their views. The post was well-received (100 karma on AF, which is very rare) and is e.g. a standard intro resource at 80k. We did it pro bono (or rather, failed to obtain retroactive funding).

We want to update the review for 2024: progress, shutdowns, trends, and our takes.

What are this project's goals? How will you achieve them?

The original goal was to help new researchers orient and know their options, to help everyone understand where things stand, and to help funders see quickly what has already been funded. Simply putting all links in one place was perhaps half of the value.

This iteration: same as above but incorporating last year's feedback and seeking to get sign-off from more than 50% of those covered. Also a professionalised version suitable for policy audiences.

$8K: bare bones update (80 hours). Skim everything, reuse the taxonomy and seek correction in the comments.

$13K: much more effort on verifying details and seeking out consensus, more editorial and synthesis

$17K: section on academic and outgroup efforts. Add a glossy formal report optimised for policy people.

How will this funding be used?

Wages.

Who is on your team? What's your track record on similar projects?

Gavin and Stag did last year's version. Stephen is the source of much of the (limited) descriptive statistics about the field.

We ran this project last year, and it was well-received. Habryka: "I think overall this post did a pretty good job of a lot of different work happening in the field. I don't have a ton more to say, I just think posts like this should come out every few months, and the takes in this one overall seemed pretty good to me."

What are the most likely causes and outcomes if this project fails?

N/A

How much money have you raised in the last 12 months, and from where?

$0 so far.

Gavin Leech

9 months ago

Final report

Description of subprojects and results, including major changes from the original proposal

The post went up roughly on time (29th December) and was fairly well-received (though it garnered less karma than last year). Comments were good and only the Alex Altair entry required notable edits. This is evidence but not strong evidence that the current version is error-free.

Our conference scrape provided some academic work which I think is underappreciated on LW but less than I hoped.

I'm very happy with the new fields (target case and broad approach) and our data entry on them.

Change: Following comments from funders we didn't do the "glossy" PDF version. Surplus money will go towards the 2025 version.

Spending breakdown

100% on salaries for the team. Thanks especially to Shoshannah Tekofsky, a highly graceful research manager.

Carmen Csilla Medina

9 months ago

So exciting, @gleech ! Is it correct that the updated report is not out yet? I tried clicking through the links here + searched on LW, but couldn't find it. (Sorry if I'm blind to the obvious!)

Gavin Leech

9 months ago

@CarmenCondor my bad - here it is!

Gavin Leech

about 1 year ago

A donor has sent another $10k, which will partly fund the 2025 edition.

Austin Chen

about 1 year ago

Manifund has now received @cfalls's $10k donation for your project and added it to this page!

Austin Chen

about 1 year ago

Approving this project! As I wrote for the Manifund blog:

Gavin Leech is a forecaster, researcher and founder of Arb; he’s proposing to re-rerun a 2023 survey of AI Safety. The landscape shifts pretty quickly, so I’d love to see what’s changed since last year.

I'm especially glad to see that others including Ryan, Anton, and Matt of OpenPhil are also excited to fund this.

(I've also updated the funding limit to indicate that Gavin's funding needs have been met)

🌻

Matt Putz

about 1 year ago

I work at Open Philanthropy, and I recently let Gavin know that Open Phil is planning to recommend a grant of $5k to Arb for this project (they had already raised ~$10k by the time we came across it).

Like others here, I believe this overview is a valuable reference for the field, especially for newcomers.

I wanted to flag that this project would have been eligible for our RFP for work that builds capacity to address risks from transformative AI. I worry that not all potential applicants are aware of the RFP or its scope, so I’ll take this opportunity to mention that this RFP’s scope is quite broad, including funding for:

Training and mentorship programs
Events
Groups
Resources, media, and communications
Almost any other type of project that builds capacity for advanced AI risks (in the sense of increasing the number of careers devoted to these problems, supporting people doing this work, and sharing knowledge related to this work).

More details at the link above. People might also find this page helpful, which lists all currently open application programs at Open Phil.

Thanks to Austin, whose EA Forum post brought this funding opportunity to our attention.

donated $300

Nickolai Leschov

about 1 year ago

I appreciate your work summarizing every live project in AI safety and would love to see it thoroughly updated for 2024.

ADAM

about 1 year ago

good luck with that

Gavin Leech

about 1 year ago

Thanks very much to all donors! A private donor has offered to fill the difference so please stop sending me money (mods, if there's a way to close projects I can't see it). We've started work.

donated $50

Chin Ze Shen

over 1 year ago

I found last year's review useful.

donated $10

ampdot

over 1 year ago

expressing my support using a signal with nonzero cost

donated $1,000

Anton Makiievskyi

over 1 year ago

I appreciated the previous iterations of this review. Trying to encourage more of the same

donated $8,000

Neel Nanda

over 1 year ago

I think collections like this add significant value to newcomers to the field, mostly by being a list of all areas worth maybe thinking about, and key links (rather than eg by providing a lot of takes on which areas are more or less important, unless the author has excellent taste). Gavin has convinced me that the previous post gets enough traffic for it be valuable to be kept up to date.

I'm not super convinced that a ton has changed since 2023, but enough has to be worth at least some updating, so I'm funding the MVP version (I expect this to have more errors than higher funding, but for these to largely be found in the comments, and even higher funding would still have errors). I'd be fine to see others funding it higher though