Testing and spreading messages to reduce AI x-risk
Overview
AI companies are currently locked in a race to create a superhuman AI, and no one knows how to make the first superhuman AI not kill everyone. There’s little governmental oversight or public awareness that everyone’s lives are being risked. We think that without government intervention directed at solving the problem and ensuring no existentially dangerous AI is developed anywhere in the world, humanity is not likely to survive. We further think it is possible to carefully but honestly inform the governments and the public of the problem and increase the chance of it being addressed. There are non-costly interventions that governments can make even before they fully agree with our understanding of the risk, that seem robustly helpful, e.g., because these interventions increase the chance of governments understanding the problem well and being able to address it directly later.
Our nonprofit aims to improve institutional response to existential risk from AI by developing and testing messaging about it and launching campaigns to educate the general public and key stakeholders about AI and related risks.
We think communicating the problem in simple and understandable, but valid (including in technicalities) language can help substantially.
What’s your plan?
With minor omissions, we plan to:
Do message testing (understand the variables, determine what successfully communicates core intuitions about the technical problem to various demographics or causes concern for valid reasons).
Core intuitions we’d want to communicate range from “Normal computer programs are human-made instructions for machines to follow, but modern AI systems are instead trillions of numbers; the algorithms these numbers represent are found by computers themselves and not designed, controlled, or understood by humans” to “AI developers know how to use a lot of compute to make AI systems generally better at achieving goals, but don’t know how to influence the goals AI systems are trying to pursue, especially as AI systems become human-level or smarter” to “If monkeys really want something, but humans really want something different, and humans don’t really care about monkeys, humans would usually get what they wanted even if it means monkeys don’t; if an AI system that’s better at achieving goals than humans doesn’t care about humans at all, it would get what it wants even if it means humans won’t get what they want. We should avoid developing superhuman systems that are misaligned with human values”.
Very different messages will work most efficiently to successfully produce technical understanding in very different people.
It might be good to simultaneously promote the government incentivizing (and not creating obstacles for) narrow and clearly beneficial uses of AI (such as in drug development), as opposed to general AI or research that shortens the timelines of general AI: we want regulation to target exclusively the inherently risky technology that might kill everyone. Responsible, economically valuable innovation that doesn’t contribute to that risk, including lots of kinds of startups, should be supported.
Iterate through content to promote with short feedback loops.
Experiment with various novel forms of messaging that have the potential to go viral.
Share results with other organizations in the space.
Scale up, improve understanding of AI and increase support for helpful policies among people whose understanding and support could be more important, coordinate with others in the space on various actions.
Prepare potential responses to possible future crises, and help people improve their understanding of AI and related risks by clearly and honestly communicating around major issues as they become public.
How will this funding be used?
Expenses by category depending on our overall annual budget:
What's your track record?
Early testing showed that making the general public read technical explanations of x-risk from AI can cost as little as $0.10 per click; we’ve also had positive experience testing communicating about x-risk, including changing the minds of 3 out of 4 people we talked to at an e/acc event.
What do you plan to do about the risks?
It’s important to carefully monitor for risks throughout the work, e.g., use polls and look at the response to the messaging from various demographics and people with different priors to actively decrease polarization (and prevent the messaging from increasing it) and at the potential for backlash and have good feedback loops from that information; avoid anything in the lines of astroturfing; etc.
What other funding are you or your project getting?
We’ve received a speculation grant from the SFF; our application is currently being evaluated in the SFF’s s-process.