4-month proof of concept for curiosity-driven agents on ARC-AGI-3

Project summary

We propose a 4-month proof-of-concept study on whether AI systems become more understandable, efficient, and human-aligned when they learn under constraints closer to those that shaped human intelligence.

We believe many failure modes of contemporary AI systems (LLM, VLMs, etc.) are difficult for people to anticipate precisely because the systems are trained and deployed under non-human conditions. Human intelligence is shaped by the constraints of limited memory, limited sensing and actuation bandwidth, limited processing speed, partial observability, specific spatiotemporal scale, and the need to rely on tools and other people for increasing agency and cognitive offloading. We propose to adapt the learning-progress view of curiosity as a single general objective under realistic environmental constraints to imbue frontier systems with human-like ways of solving tasks and efficiency. Under this paradigm intrinsically valuable patterns are neither random nor trivial, but those for which the agent can currently improve compression or prediction.

As a proof-of-concept study, we want to test our general approach in one of the hardest AGI benchmarks: ARC-AGI-3, released just a few days ago. Most strong AI systems still depend heavily on static prompting, large compute budgets, and task-specific scaffolding. ARC-AGI-3 exposes this weakness because its environments are interactive, novel, instruction-free, and scored by how efficiently agents learn and solve them over time. While humans achieve 100% performance within a short amount of time, frontier systems stay below 1% and use orders of magnitude more actions. We believe the key to closing the large human-AI gap on ARC-AGI-3 lies in endowing frontier models with curiosity-as-prediction-progress and awareness of constraints: formalized notions of the observation and action interfaces, time, and energy. We treat ARC-AGI-3 as a controlled laboratory for budgeted agency.

Our project is aimed at reducing risk from alien AI systems by shaping the environments and learning objectives under which they are developed to be more human-like. This should directly advance AI safety and alignment and provide a fruitful path for further work. The project aim is not to improve benchmark performance without any regard to how that performance is achieved. We explicitly emphasize human-like learning efficiency and capability through our constraints and will conduct evaluations on the similarities and divergence between artificial and human agents.

Our longer-term goal and agenda (https://arxiv.org/abs/2602.24100) is to move away from language-based pre-training to realistic, varied environments within which general agents can be trained. The use of language should simply arise through the necessity of communication with other agents and humans. This is an alternative approach to the current paradigm of scaling and aligning LLMs to produce powerful agents. We believe there are fundamental limitations to LLM scaling and wish to both formalize and explore a more general approach to artificial intelligence based on first principles of information theory and living physical systems.

What are this project's goals? How will you achieve them?

Goals

Build an agent that explicitly allocates budget across observation, action, and deliberation.
Evaluate whether curiosity-driven learning under constraints improves performance on ARC-AGI-3, especially in action efficiency and adaptation to novel tasks. Test whether realistic resource constraints can improve AI efficiency, interpretability, and alignment.
Produce reusable open infrastructure and a paper describing the method, results, and implications for AI safety.
A follow-up roadmap for extending the work beyond ARC-AGI-3 into more complex interactive and multi-agent settings.

A very ambitious success target is to solve all ARC-AGI-3 games with action efficiency approaching human performance. In practice, over 4 months, the main objective is to make measurable progress toward that target while producing tools, evidence, and a clear roadmap for the next stage.

How we will achieve them

Develop the formal framing of a constrained agent-environment system with explicit observation and action interfaces, plus time and energy costs.
Build an ARC-AGI-3 training and evaluation framework.
Implement a budget-aware agent with observe, act, and think meta-actions.
Use learning progress in compression/prediction as the intrinsic objective, so the agent focuses on patterns it can currently learn.
Benchmark the system on ARC-AGI-3, with particular attention to task completion, action efficiency, and qualitative similarities or divergences relative to human problem-solving.
Open-source the code and results, and write up the findings.

How will this funding be used?

The current bare-bones budget is $100,000 split among:

$75,000 for compensation for two people working full-time for 4 months
$10,000 for compute
$10,000 for LLM API usage
$5,000 for conference travel and fees

If funding comes in below this level, we would scale back scope while preserving the core proof-of-concept, i.e. build the ARC-AGI-3 framework, implement the constrained agent, and run a smaller but still informative evaluation. If funding comes in above this level, we would expand the team, extend the timeline to 6 months, and experiment with more approaches described in our research agenda (above) as well as work on a paper for theoretical grounding of our approaches.

Who is on your team? What's your track record on similar projects?

Richard Csaky - project lead and research

https://www.linkedin.com/in/richard-csaky/

https://richardcsaky.notion.site/main

Come Chevalier - engineering

https://www.linkedin.com/in/côme-chevalier-bb5526147/

Richard’s record spans multimodal foundation models, neural time series, and real time ML systems, with publications in NLP and neuroscience plus recent foundation model scale NeuroAI work. He obtained his PhD from Oxford studying the intersection of human and artificial intelligence. His most recent project was funded by a Foresight Institute AI Safety grant (stipend + compute for 6 months), training a long context generative model on 500+ hours of MEG brain data, proposing stability and context specificity evaluations under distribution shift, and open sourcing a reproducible training and evaluation pipeline. Across projects he consistently owns the full stack from data and infrastructure through modeling, rigorous evaluation, and deployment.

Holding a Master's degree in engineering, Côme has built strong experience in the automotive industry (Bosch and aiMotive), spanning development and system-level responsibilities in autonomous driving. He worked on solutions involving safety-related constraints in close collaboration with Al research and product teams. More recently, he has worked across a range of applied Al topics, including NeRF and Gaussian Splatting as well as web solutions integrating agentic Al components. His profile combines technical depth with hands-on implementation across diverse Al use cases.

What are the most likely causes and outcomes if this project fails?

The most likely failure mode is partial technical failure. ARC-AGI-3 is an extremely difficult benchmark, and the strongest current systems still perform very poorly relative to humans. It is possible that, within 4 months, our agent does not achieve large absolute gains or does not approach human-like action efficiency.

The most likely causes of failure are:

the benchmark is too difficult for a first proof of concept
the curiosity/prediction-progress objective does not translate into sufficient gains in this setting
the observe/act/think budgeting mechanism is harder to tune than expected
compute or API budgets limit the scale of iteration and ablation
the formal framework is directionally correct but not yet mature enough to drive strong empirical performance quickly

Even in that case, we still expect useful outputs:

an open-source ARC-AGI-3 framework
a clear empirical record of what did and did not work
negative results that help refine the theory
a stronger roadmap for follow-on work on constrained, interactive agents
a clearer view of whether this line of work is promising for AI alignment and human-like learning efficiency

How much money have you raised in the last 12 months, and from where?

For this project we have not raised any money so far.