AI Plans shows a clear list of possible alignment plans, and the vulnerabilities the plans have.
AI-Plans.com
Project summary
AI-Plans.com is a rapidly growing platform for feedback on AI Alignment research.
As of January 2024, there are 100+ alignment plans on the site, with 150+ Critiques.
We hold bi-monthly Critique-a-Thon events, for which participation has continued to increase.
It’s extremely useful for many reasons:
- Showcases just how many vulnerabilities there are in all the current alignment plans
- Drastically improves the feedback loop for AI Alignment researchers
- Makes it much easier to contribute to AI Safety research
- Provides credentials for anyone looking to get started in AI Safety (badges and position on leaderboard)
On the site, all alignment plans are scored and ranked from highest to lowest, with new plans always starting at the top. Users vote on the critiques rather than on the plans themselves. Plans are then scored by the sum of the scores of Strength Critiques minus the sum of the scores of Vulnerability Critiques.
We use a karmic voting system which gives more weight to votes cast by more trusted (i.e. more upvoted and less downvoted) users. Users are incentivized with a leaderboard and badges.
The author or poster of a plan can iterate on their plan by selecting critiques to address and creating a new version.
There are several new features coming in a rebuild, which is currently being worked on, including:
- Sub critiques
- Annotations without additional sign-up (Currently, annotating on a post requires signing in with a hypothes.is account)
We also run Critique-a-Thons of AI Alignment plans. See the results of the December Critique-a-Thon here: https://aiplans.substack.com/p/december-critique-a-thon-results
We were the first ones to release detailed Critiques of the recent Superalignment paper by OpenAI and the recent DeepMind papers
What are this project's goals and how will you achieve them?
AI-Plans.com aims to accelerate AI alignment research via a focused feedback loop of public peer review.
The site is designed to elicit high-quality feedback from an open community of alignment researchers and enthusiasts at a rapid pace. It is easier to write a critique than to develop a plan, and it is easier to vote on a critique than to write one. We leverage this scaling of cognitive effort to produce quantitative and qualitative feedback for researchers. This feedback can also provide insight into the quality of alignment plans produced by various companies and researchers, as well as providing insight into the state of alignment more broadly.
The alpha version of the site is live and has already been useful to multiple alignment researchers. We are currently developing the beta release, which includes a more professional design and richer features. Development is no longer talent-constrained, since 6 developers have joined the team.
How will this funding be used?
1. Paying team members for their work
2. Prize funds for Critique-a-Thons
Who is on your team and what's your track record on similar projects?
We have held multiple Critique-a-Thons, which have been highly successful in generating high-quality critiques. (As a side-effect, these have also output broadly useful documents, such as a list of common alignment plan vulnerabilities and a list of ).
Co-founders include:
Kabir – Director, Writer,
Nathan – Quality Director, Project Manager
Koi – Cybersecurity specialist and highly experienced backend developer
Marvel – highly talented developer (recently won a hackathon with his team)
What are the most likely causes and outcomes if this project fails? (premortem)
The project could potentially fail due to poor coordination within the team or a failure to hone the site’s design toward the mission. However, this is extremely unlikely, because the team is coordinating very well and we're extremely wired to make sure we stick to the mission and the site being as useful as possible.
What other funding are you or your project getting?
The first Critique-a-Thon was given a prize fund by AI Safety Strategy for $500.
A private donor gave £4000 (~$5000) to the team: $2000 went toward prize funds for the second and third Critique-a-Thons and the remainder went towards funding development of the site.
Austin Chen
10 months ago
Approving this project; echoing Greg, I think AI Plans has made good progress (eg with its site design) since I last saw them. I also like some of the judges they chose for their December critique-athon, such as Nate Soares and Tetraspace.
Kabir Kumar
10 months ago
@Austin Thank you!! I know the site design still needs a lot of work! We're working on a rebuild at the moment, which will be ready soon!
To be clear, Tetraspace was a participant.
Greg Colbourn
10 months ago
Supporting this because it is useful to illustrate how there are basically no viable AI Alignment plans for avoiding doom with short timelines (which is why I think we need a Pause/moratorium). Impressed by how much progress Kabir and team have made in the last few months, and look forward to seeing the project grow in the next few months.