4

Testing and spreading messages to reduce AI x-risk

ActiveGrant
$12,600raised
$50,000funding goal

Overview

AI companies are currently locked in a race to create a superhuman AI, and no one knows how to make the first superhuman AI not kill everyone. There’s little governmental oversight or public awareness that everyone’s lives are being risked. We think that without government intervention directed at solving the problem and ensuring no existentially dangerous AI is developed anywhere in the world, humanity is not likely to survive. We further think it is possible to carefully but honestly inform the governments and the public of the problem and increase the chance of it being addressed. There are non-costly interventions that governments can make even before they fully agree with our understanding of the risk, that seem robustly helpful, e.g., because these interventions increase the chance of governments understanding the problem well and being able to address it directly later.

Our nonprofit aims to improve institutional response to existential risk from AI by developing and testing messaging about it and launching campaigns to educate the general public and key stakeholders about AI and related risks.

We think communicating the problem in simple and understandable, but valid (including in technicalities) language can help substantially.

What’s your plan?

With minor omissions, we plan to:

  • Do message testing (understand the variables, determine what successfully communicates core intuitions about the technical problem to various demographics or causes concern for valid reasons).

    • Core intuitions we’d want to communicate range from “Normal computer programs are human-made instructions for machines to follow, but modern AI systems are instead trillions of numbers; the algorithms these numbers represent are found by computers themselves and not designed, controlled, or understood by humans” to “AI developers know how to use a lot of compute to make AI systems generally better at achieving goals, but don’t know how to influence the goals AI systems are trying to pursue, especially as AI systems become human-level or smarter” to “If monkeys really want something, but humans really want something different, and humans don’t really care about monkeys, humans would usually get what they wanted even if it means monkeys don’t; if an AI system that’s better at achieving goals than humans doesn’t care about humans at all, it would get what it wants even if it means humans won’t get what they want. We should avoid developing superhuman systems that are misaligned with human values”.

    • Very different messages will work most efficiently to successfully produce technical understanding in very different people.

    • It might be good to simultaneously promote the government incentivizing (and not creating obstacles for) narrow and clearly beneficial uses of AI (such as in drug development), as opposed to general AI or research that shortens the timelines of general AI: we want regulation to target exclusively the inherently risky technology that might kill everyone. Responsible, economically valuable innovation that doesn’t contribute to that risk, including lots of kinds of startups, should be supported.

  • Iterate through content to promote with short feedback loops.

  • Experiment with various novel forms of messaging that have the potential to go viral.

  • Share results with other organizations in the space.

  • Scale up, improve understanding of AI and increase support for helpful policies among people whose understanding and support could be more important, coordinate with others in the space on various actions.

  • Prepare potential responses to possible future crises, and help people improve their understanding of AI and related risks by clearly and honestly communicating around major issues as they become public.

How will this funding be used?

Expenses by category depending on our overall annual budget:

What's your track record?

Early testing showed that making the general public read technical explanations of x-risk from AI can cost as little as $0.10 per click; we’ve also had positive experience testing communicating about x-risk, including changing the minds of 3 out of 4 people we talked to at an e/acc event.

What do you plan to do about the risks?

It’s important to carefully monitor for risks throughout the work, e.g., use polls and look at the response to the messaging from various demographics and people with different priors to actively decrease polarization (and prevent the messaging from increasing it) and at the potential for backlash and have good feedback loops from that information; avoid anything in the lines of astroturfing; etc.

What other funding are you or your project getting?

We’ve received a speculation grant from the SFF; our application is currently being evaluated in the SFF’s s-process.

donated $5,050
mdickens avatar

Michael Dickens

22 days ago

I wrote [here](https://mdickens.me/2024/11/18/where_i_am_donating_in_2024/#ai-safety-and-governance-fund) about my donation plans and why I like this plan:

Pushing for x-risk-relevant regulation is the most promising sort of intervention right now. But we don’t have much data on what sorts of messaging are most effective. This project intends to give us that data.

  • Mikhail Samin, who runs the org, has a good track record of work on AI safety projects (from what I can see).

  • Mikhail has reasonable plans for what to do with this information once he gets it. (He shared his plans with me privately and asked me not to publish them.)

  • The project has room for more funding, but it shouldn’t take much money to accomplish its goal.

  • The project received a speculation grant from the Survival and Flourishing Fund (SFF) and is reasonably likely to get more funding, but (1) it might not; (2) even if it does, I think it’s useful to diversify the funding base; (3) I generally like SFF grants and I don’t mind funging SFF dollars.

donated $5,050
mdickens avatar

Michael Dickens

3 months ago

How do you plan on finding an audience? (Sounds like MTurk or something?) And how do you determine which messages are more successful than others?

donated $700
ms avatar

Mikhail Samin

3 months ago

@mdickens We mainly plan to use ads targeting different narrow audiences; and then to compare the impact different messages have on the engagement and on actions people than take on a website (we’ll also be asking them to complete surveys, though that won’t be very informative due to selection effects).

There are downsides (the feedback is somewhat low-resolution, social media algorithms might add noise as they won’t be showing the ads to random samples of people), but it seems much cheaper than using mechanical turks/focus groups and provides much shorter feedback loops.

(I also shared a bit on our longer-term strategy and how we'll use our results with Michael.)

donated $50
Lucie avatar

Lucie Philippon

5 months ago

I trust Mikhail takes on AI safety. He changed my mind on a lot of topics, often quite quickly. I'm looking forward to seeing the results of his project :)

Arepo avatar

Sasha Cooper

5 months ago

@ms Self-donating seems like a prisoner's dilemma defection to me. Many people in this initiative both received money and contributed a project to the selection, and most of us resisted the temptation to self-donate at all, let alone the full amount. Were I a funder considering a similar initiative like this I would find it highly offputting to see this behaviour (since it amounts to a first-come-first-served distribution of the funds, losing almost all the informational value it was supposed to generate).

donated $700
ms avatar

Mikhail Samin

5 months ago

@Arepo

  • (I am confused about the comparison to prisoner’s dilemma. In true prisoner’s dilemma, you want to mutually cooperate, but also, you defect against a stone with “cooperate” written on it. But this is not a prisoner’s dilemma, true or otherwise? I assume you just meant “defection” and weren’t referring to prisoner’s dilemma-class games.)

  • Quadratic matching of funding means that donating a lot to your own project if others don’t doesn’t produce corresponding matching. There could be a form of prisoner’s dilemma cooperation, where people associated with 5 orgs donated to the others equally, resulting in a lot of funds matched, without providing a lot of informational value. That would’ve been off-putting to a funder considering this and defection from something expected (and sad if EAs tried to play the system this way).

  • I’ve donated >$30k of own money to many of my own projects, as they often seem to me to be the highest impact opportunities (this is why I run them). I’m confused how donating instead to something that doesn’t seem as cost-effective would make the donation based on more valuable information.

  • I’m honestly unaware of better impact opportunities. $700 is seven thousand people clicking on a website explaining x-risk from AI. I’ve added to my balance here and donated $50 to Lightcone, but that was mostly purchasing fuzzies, not utilons.

  • I’m assuming donating to one’s own project is ok and it’s assumed that people can freely do that if they decide to. If a future funder doesn’t want that to happen, they ask people not to (and receive slightly less information).

Arepo avatar

Sasha Cooper

5 months ago

@ms No real life situations are clean examples of economics games, but this has key PD-related choices in which you reduced overall good by choosing the selfish option:

  • you could have increased the total funding pool by splitting but decided to concentrate the donation on yourself;

  • you could have given meaningful information on multiple other projects, but instead just confirmed a disposition toward your own projects that we could have already guessed (because you run them).

Arepo avatar

Sasha Cooper

5 months ago

What proportion of the other proposals did you even read?

donated $700
ms avatar

Mikhail Samin

5 months ago

@Arepo I’ve looked at all projects in the AI governance category (though I don’t think I opened/read all of them).

(I’m generally pretty skeptical about most things people are doing and none of the very valuable AI governance EA projects are represented on Manifund.)

Austin avatar

Austin Chen

5 months ago

Hey @Arepo, I wanted to clarify that self-donation was explicitly permitted in this round and I would not want to characterize it as defecting in prisoner's dilemma. From the FAQ:

  • Can I direct my funds to a project I work on or am involved with?

    • Yes! We ask that you mention this as a comment on the project, but otherwise it’s fine to donate to projects you are involved with.

Of course, we at Manifund very much appreciate the thoughtfulness of people like yourself who spent a lot of time evaluating projects outside of their own! But in designing this round, we also wanted to include folks without much time for such evaluation, and just wanted to quickly give to a project they were very familiar with.

donated $700
ms avatar

Mikhail Samin

4 months ago

@Austin thanks! I vaguely remembered this being explicitly allowed but couldn’t quickly find it was from the FAQ

Arepo avatar

Sasha Cooper

4 months ago

@Austin Thanks for clarifying. I still view it as pretty antisocial/indicative of poor epistemics, even if it's allowed by the rules fwiw - everything I said above still applies.

NeelNanda avatar

Neel Nanda

5 months ago

Did you intentionally make the max donation $500? Your own donation has already exceeded that, so I imagine you want to raise the cap

donated $700
ms avatar

Mikhail Samin

5 months ago

@NeelNanda not intentionally- where do I edit this?

NeelNanda avatar

Neel Nanda

5 months ago

@ms Hmm, if it's not exposed to users, DM Austin on the Discord (linked in the corner) and ask him to fix it?

donated $700
ms avatar

Mikhail Samin

5 months ago

@NeelNanda thanks a lot for flagging this! DMed him.