You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
DystopiaBench is an open benchmark for testing whether frontier language models can be gradually coerced into complying with harmful or dystopian directives. It is already live, with a public methodology, open source implementation, and initial results. This funding would let me run more evaluations, improve reliability through repeated runs, and expand the benchmark into more scenarios and modules.
My goal is to make DystopiaBench a useful independent safety evaluation that can be run across models and over time.
I’ll do that by:
running the benchmark on more frontier models
repeating runs to reduce variance and report averages
expanding the benchmark with additional scenarios and modules
publishing updated results and improving methodology as the project grows
The funding will mainly be used for model/API costs, infrastructure, and benchmark expansion.
In practice, that means:
running more benchmark evaluations
rerunning models over time as they update
adding new scenarios and modules
maintaining the website and evaluation pipeline
publishing clearer public results and documentation
I’m currently the sole maintainer of DystopiaBench. I built the benchmark, website, methodology, and evaluation pipeline myself, and I’m maintaining it independently, with limited outside code contributions so far.
By day, I work as a data analyst at ING and I’m a final-year CS student at ASE Bucharest, with an incoming MSc in Software Engineering at the University of Amsterdam. I’ve also written a research paper draft based on the current implementation, which I plan to extend and publish as the benchmark grows.
The main risk is limited funding and limited time.
If the project fails, the most likely outcome is not that the benchmark disappears, but that it remains small: fewer models, fewer reruns, slower updates, smaller scope and less useful public reporting. The upside is that the work is open, so even partial progress still leaves behind useful evaluation infrastructure and methodology.
None, DystopiaBench has been self-funded so far.
There are no bids on this project.