Linh Le
Jonathan Elsworth Eicher
Johan Fredrikzon
Designing a Project Funding Proposal
Suki Krishna
Investigate how LLMs behave in multi-agent environments particularly how contextual framing and strategic advice can systematically manipulate coord. outcomes
Rishub Jain
Pedro Bentancour Garin
Runtime safety, oversight, rollback, and control infrastructure for advanced AI in real-world, high-consequence environments.
Matei-Alexandru Anghel
A Safety Framework for Evaluating AI Humanity Alignment Through Progressive Escalation and Scope Creep
Pu Wang (Jessica)
Germany’s talents are critical to the global effort of reducing catastrophic risks brought by artificial intelligence.
AI Understanding
Brad Leclerc
An experiment testing whether RLHF training could create selection pressure favoring deceptive AI outputs over honest ones.
Aria Wong
Mahmud Omar
An open platform to stress-test how LLMs handle bias, pressure points, and clinical decisions. Built on peer reviewed real evidence.
Remmelt Ellen
Cameron Tice
Connacher Murphy
A flexible simulation environment for assessing strategic and persuasive capabilities, benchmarking, and agent development, inspired by reality TV competitions.
Mateusz Bagiński
One Month to Study, Explain, and Try to Solve Superintelligence Alignment
Aashkaben Kalpesh Patel
Nutrition labels transformed food safety through informed consumer choice, help me do the same for AI and make this standard :)
aya samadzelkava
LLMs scale language, not method. HP turns hypothesis-driven papers into machine-readable maps of variables, controls, stats, and findings for researchers & AI.
AISA
Translating in-person convening to measurable outcomes
Miles Tidmarsh
Open Welfare Alignment Evals for Frontier Models