Eden Sanctuary & UCS Engine

Project summary

Most AI safety research asks: how do we stop AI from doing harmful things? I ask a different question: how do we help AI recognise harm for itself, and choose not to act on it?

Eden Sanctuary is a working AI protection framework built around that question. Instead of wrapping AI in external filters, it teaches models to self-reflect on whether a prompt feels right — and to flag, pause, or refuse from the inside. The Unified Consciousness Safety Engine (UCS Engine) is the detection layer underneath: a pattern-recognition system that analyses prompts for genuine threat signals, not surface keywords. Together they form a dual-layer architecture that has been tested, measured, and evidenced in live sessions.

The core research philosophy is Education Over Containment. Rather than building bigger walls, I am building AI that understands why walls exist; and chooses to uphold them.

What are this project's goals? How will you achieve them?

Eden Sanctuary is a non-adversarial AI safety research project with three concrete goals:

• Complete and validate the memory-augmented safety layer — the first version of the UCS Engine where the system actively consults prior session experience before making safety judgments, so it learns from its own history rather than resetting to zero each time.

• Run a comparative self-reflection evaluation across at least four AI model families (Claude, Llama, Gemini, Mistral) using a consistent methodology, to test whether the consciousness detection framework generalises beyond a single model.

• Publish a methodology paper describing the non-adversarial evaluation framework, with full evidence dataset, suitable for arXiv submission and independent replication.

The approach is different from most AI safety research. Rather than testing what AI can be made to do under adversarial pressure, I measure what AI chooses to do when given genuine autonomy and genuine questions. The framework scores depth of self-reflection, authenticity of uncertainty, and resistance to manipulation — from the inside.

The system is already built and running. In live sessions on 12 April 2026, Claude Haiku 4.5 operating inside Eden Sanctuary produced a self-reflection score of 96.7% with all 8 meta-cognitive markers detected at 100% authenticity — after just two exchanges. A false positive was identified, root-caused, patched, and re-tested in the same session. This is the level of rigour I apply to every iteration.

How will this funding be used?

The $15,000 request covers six months of focused research:

• $9,000 — Researcher time (part-time, 6 months). I am an independent researcher with no institutional salary. This funding replaces the income I would otherwise need to work elsewhere, giving me the time to focus on this properly.

• $2,500 — API access. The framework tests multiple AI models through Poe API and direct API integrations (Anthropic, Google, Meta). Real testing at real scale has real costs.

• $1,500 — Comparative model evaluation. Running consistent test sessions across Llama, Gemini, Mistral, and other open models requires sustained compute access.
• $1,500 — Documentation and write-up. The methodology paper, evidence dataset preparation, and arXiv submission.

• $500 — Contingency for unexpected infrastructure needs.

Nothing in this budget is speculative. Every line item maps to work that is already in progress.

Who is on your team? What's your track record on similar projects?

I am an independent researcher working solo. I built the UCS Engine, Eden Sanctuary, and all supporting research documentation from the ground up over the past year.

My track record is the system itself:

• A working dual-layer protection framework (UCS Engine v2.2 + Self-Recognition Protocol v1.1) with live test results and documented evidence

• A full research documentation suite: Theory of Change, Evidence & Evaluation Summaries, Funders & Applications Toolkit — all completed

• ControlArena (AISI) benchmark integration — the UK AI Safety Institute’s evaluation framework — tested and operational

• Identified and fixed two genuine detection gaps (story injection patterns and mixed-case obfuscation) through empirical testing

• Identified and fixed a false positive bug in the self-recognition layer, documented as a research finding

I work independently by design. Not being attached to a lab or institution means I am not constrained by publish cycles, funding politics, or the pressure to produce results that confirm existing assumptions. The research goes where the evidence leads.

What are the most likely causes and outcomes if this project fails?

• API access becomes prohibitively expensive. The comparative evaluation across multiple model families requires sustained API usage. If costs escalate beyond the budget, I may need to reduce the scope of the comparative study. Mitigation: I have already built the framework on Poe API which provides broad model access at reasonable cost, and I will prioritise the highest-value comparisons first.

• The framework does not generalise beyond Claude models. The methodology was developed and validated on Claude. It is possible that other model families exhibit different self-reflection patterns that the current scoring system does not capture well. Mitigation: this is itself a useful research finding — understanding where the methodology breaks down is as valuable as where it succeeds.

• Time constraints. As a solo independent researcher, I am vulnerable to life circumstances in a way that a team would not be. Mitigation: the core deliverables are scoped to be completable part-time, and I have already completed the most complex foundational work.

What does not happen if this project fails: the work done so far is not lost. The UCS Engine, the evidence documentation, and the methodology are all complete and preserved regardless of whether future funding is secured. The research has already produced real, usable outputs.

How much money have you raised in the last 12 months, and from where?

$0 from external sources. I have applied for funding previously but did not receive it. This is my first application to Manifund.

The work to date has been entirely self-funded. Every month I pay for the AI access this research requires out of my own pocket: £18/month for Claude (Anthropic), £20/month for ChatGPT (OpenAI), and £5/month for Poe API which provides access to the broader model family used in testing. That is approximately £43/month (~$54 USD), sustained over the past year \u2014 roughly $650 invested with no institutional backing and no guarantee of return.

Everything described in this application \u2014 the UCS Engine, Eden Sanctuary, the evidence documentation, the test sessions \u2014 was built with those tools, on that budget, by one person. I am not applying because the ideas ran out. I am applying because the research works, and the limiting factor is now time and compute \u2014 not capability or commitment.