This project is concerned with the intersection between human learning and machine learning. Based on the Free Energy Principle framework developed by Karl Friston, we want to examine what connections deep learning based AI systems have with human brains and human learning processes. This is important for alignment work since humans have many properties like trust, honesty, self-maintenance, and corrigibility, which we want future AI systems to also possess. We are also concerned with the AI safety properties of non-LLM brain-like AI models which have been proposed by different parties, and want to proactively consider what it would take to develop an ‘alignment moonshot’ based on a coherent theory of learning which applies to both humans and AI systems.
The current branch of the project (up to end of Apr 2026) has received funding from the ARIA Mathematics for Safeguarded AI opportunity space, under the opportunity seeds program.
Project overview - https://docs.google.com/document/d/1fl7LE8AN7mLJ6uFcPuFCzatp0zCIYvjRIjQRgHPAkSE
This Manifund project has been set up to receive the ACX grant for 2025. Any further funding will go towards recouping living expenses accumulated in the period before the project started, or extension research after the current project branch ends in May 2026.
N/A
I am worried about near-term non-LLM AI developments (LW post explaining rationale behind project.