Physics-Governed Memory for Auditable AI

Project summary

This project develops a physics-governed memory substrate for AI systems, where learning, forgetting, and internal state changes emerge from explicit, auditable laws rather than opaque updates or human approval.

Current AI systems rely on brittle prompt engineering, ad-hoc memory stores, or hidden gradient updates, making long-horizon behavior difficult to audit and govern. This work separates a static LLM from a sovereign memory system governed by laws.

The goal is to demonstrate that long-term identity stability and resistance to deceptive alignment can arise structurally — without guardrails, policies, or model retraining — and to provide a concrete, open-source prototype that AI safety researchers and auditors can evaluate and build upon.

What are this project's goals? How will you achieve them?

Goals

Build a working prototype of a law-governed memory system that constrains how AI internal state evolves over long interaction horizons.
Demonstrate that internal state changes are fully traceable, auditable, and resistant to hidden drift.
Evaluate whether such systems show improved robustness against long-horizon manipulation compared to standard RAG or prompt-based memory.

How

Formalize the governing dynamics as explicit invariants.
Implement the system as a lightweight wrapper around existing LLMs, keeping the model stateless while memory evolves lawfully.
Run extended interaction tests (100+ turns) and attempts to induce identity drift, deceptive alignment, or covert goal formation.
Publish the prototype and technical documentation openly for scrutiny and reuse.

Planned road-map through four phases over 12–18 months (part-time baseline):

Months 1–4: Complete core prototype integration (short-term memory buffer, full turn loop, basic transducer)
Months 5–10: Conduct extended evals (100–500+ turns, manipulation/deception attempts, baseline comparisons to RAG/prompt memory).
Months 11–14: Analyze results (identity stability, tension propagation, FSV drift), iterate on dynamics.
Months 15–18: Polish open-source release (GitHub repo with code, schema, eval scripts, demo notebook), write technical report/blog, seek community feedback/red-teaming.

How will this funding be used?

Funding will primarily protect essential hardware, cover tooling and modest compute, and—if stretch goals are met—provide secure time to accelerate development, evaluation and polish.

Minimum ($13,400): This is the safe threshold to complete and release a working prototype with full auditability, robustness tests (100+ turns, manipulation attempts), and open-source code/docs within 12–18 months, even alongside full-time NHS work (all current development happens after my kids are asleep). It covers:
- Replacement/upgrade of my sole development laptop (~$5,400–$6,000) to eliminate the risk of relying on my only personal device.
- Active cooling, high-capacity SSD, and monitor (~$800–$1,000) for reliable long simulation runs.
- AI coding assistants, documentation tools, and 12–18 months of API/inference costs (~$2,000–$3,300) to support implementation and testing.
- Small buffer for unexpected bits.
With this level, the project definitely happens—the ideas, dynamics, and testing are the real work, not fancy resources.
Ideal/Stretch ($33,500): Additional $20,100 for 6–9 months of secure stipend/time buy-out to drop to part-time NHS hours over a fixed period. This would speed up development, allow deep evaluations (longer horizons, more red-teaming for deceptive drift/covert goal formation), professional-grade documentation, visualizations of the memory dynamics, and community outreach—turning a solid prototype into a highly scrutinizable reference for AI safety researchers.

Who is on your team? What's your track record on similar projects?

This is currently a solo project.

I am a clinical coding trainer within the UK NHS, with professional experience in systems where auditability and traceability are non-negotiable. While not formally trained in ML research, I have spent the last ~18 months building and iterating on stateful AI systems during nights and weekends, progressing through several prototypes (ELLI (personal home assistant) → Nova → Atlas) that exposed the limitations of prompt-based and heuristic memory.

The current work includes a functioning database schema, memory dynamics, and orchestration code demonstrating non-trivial implementation progress. The project reflects applied systems engineering with working code rather than speculative theory.

What are the most likely causes and outcomes if this project fails?

Likely causes of failure

The memory dynamics may not produce sufficiently clear advantages over existing approaches.
The system may prove too complex or brittle to integrate cleanly with current LLM workflows.
Evaluation may reveal limited practical impact under realistic usage patterns.

Outcomes if it fails

Even partial results would shed light on the potential limits of law-based internal governance and inform future safety research.
The open-source prototype and documentation would still provide a concrete reference for others exploring auditable state or memory governance.
Failure would be epistemically valuable rather than catastrophic; the project does not introduce new capabilities risks.

How much money have you raised in the last 12 months, and from where?

None.
This project has been entirely self-funded to date.