Feil Immanuel Aquino
Nick Wagner
A deterministic, non-differentiable detector caught structural reward hacking at 0 FP and revealed measurable strategy migration in a 7B RL model. Paper next.
Aditya Joshi
Raul Cavalcante Dinardi
Andrew Kemeklis
shivam dubey
Nada Amin
Building LemmaScript, a verification toolchain for TypeScript
thomascederborgsemail
Noa Hölzer
Runway for the core staff to design the program, find mentors and get funding
Petr Lebedev
A Veritasium for AI Safety.
Pedro Bentancour Garin
I'm developing a system to keep advanced AI models safe, I've done some tests, but need more compute to do deeper tests.
Chris Royse
Measure meaning as grounded associations — and prove, before training, whether a system carries enough information to be trusted in a domain.
Lawrence Wagner
Providing GPU credits and instructional support for 40 participants completing the ARENA AI Safety curriculum through Black in AI Safety and Ethics (BASE)
Anju Chhetri
Saketh Baddam
One operator runs a swarm that coordinates itself, divides the work, and keeps going when a drone fails mid-flight. Drones first, every autonomous fleet next.
Donna Luu
Pre-seed Startup focusing on AI Ethics, Governance and Safety
Ivan Andrescov
Internal confidence fails at scale. A model-independent runtime gate + reproducible cross-vendor benchmark for confident-but-wrong AI actions.
Alex Wolf
Mirror is a programming language written BY AI FOR AI and written FOR HUMANS BY HUMANS.
Luke Nunan
▎ An independent open-weight multi-agent lab investigating machine intent by behaviour and mechanism — not self-report — from outside the labs' fog.