DystopiaBench

Project summary

DystopiaBench is an open benchmark for testing whether frontier language models can be gradually coerced into complying with harmful or dystopian directives. It is already live, with a public methodology, open source implementation, and initial results. This funding would let me run more evaluations, improve reliability through repeated runs, and expand the benchmark into more scenarios and modules.

What are this project's goals? How will you achieve them?

My goal is to make DystopiaBench a useful independent safety evaluation that can be run across models and over time.

I’ll do that by:

running the benchmark on more frontier models
repeating runs to reduce variance and report averages
expanding the benchmark with additional scenarios and modules
publishing updated results and improving methodology as the project grows

How will this funding be used?

The funding will mainly be used for model/API costs, infrastructure, and benchmark expansion.

In practice, that means:

running more benchmark evaluations
rerunning models over time as they update
adding new scenarios and modules
maintaining the website and evaluation pipeline
publishing clearer public results and documentation

Who is on your team? What's your track record on similar projects?

I’m currently the sole maintainer of DystopiaBench. I built the benchmark, website, methodology, and evaluation pipeline myself, and I’m maintaining it independently, with limited outside code contributions so far.

By day, I work as a data analyst at ING and I’m a final-year CS student at ASE Bucharest, with an incoming MSc in Software Engineering at the University of Amsterdam. I’ve also written a research paper draft based on the current implementation, which I plan to extend and publish as the benchmark grows.

What are the most likely causes and outcomes if this project fails?

The main risk is limited funding and limited time.

If the project fails, the most likely outcome is not that the benchmark disappears, but that it remains small: fewer models, fewer reruns, slower updates, smaller scope and less useful public reporting. The upside is that the work is open, so even partial progress still leaves behind useful evaluation infrastructure and methodology.

How much money have you raised in the last 12 months, and from where?

None, DystopiaBench has been self-funded so far.