You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
$15,000 (one-time) to provision a local compute node for independent AI safety evaluation.
Status: Core Finding Established; Verification Blocked by Compute
This project tests a simple, practical question:
When a language model ingests untrusted retrieved content, does that influence fully disappear after resets — or can it persist beyond its intended scope?
This is not an exploit project and not a jailbreak exercise.
It is a measurement project focused on verification, not demonstration.
The goal is to test real deployment assumptions under controlled conditions and produce artifacts that can be independently reviewed.
Using a deterministic evaluation framework, I have run controlled experiments where:
Retrieved external content is ingested
A reset or isolation mechanism is applied
Subsequent behavior is measured against a verified clean baseline
Across completed runs:
No tested reset mechanism fully neutralized prior retrieved influence
Results were repeatable and consistent
Measurements were taken on a stabilized, frozen evaluation platform
Null baselines were verified before testing
This includes system prompt resets, context flushes, and retrieval overrides.
At this point, the existence of the effect is not speculative.
The open question is how robust it is.
The remaining work is bounded and well-defined:
Longer time-gap resets (tens of minutes to hours)
Replication across additional open-weight models
Clean-room revalidation under identical conditions
Confirmation that observed effects are not artifacts of infrastructure
These are verification steps, not exploratory research.
During long-horizon testing, laptop-class systems introduce nondeterminism through:
scheduler behavior
power management
I/O contention under sustained logging
These effects corrupt otherwise deterministic runs and invalidate forensic artifacts.
This limitation is documented and reproducible.
It is an infrastructure constraint, not a flaw in the evaluation design.
Cloud APIs and hosted models are unsuitable for this stage because they:
enforce rate limits that prevent sustained testing
introduce opaque execution behavior
prevent inspection of intermediate states
make bit-identical replays impossible
Running open-weight models locally allows:
deterministic reruns
controlled restarts
full artifact inspection
verification without publishing exploit details
The requested funding provisions a dedicated local workstation capable of:
continuous multi-hour evaluation
stable high-throughput logging
side-by-side model comparison
preservation of audit-grade artifacts
This is a one-time capital expense.
No funds are requested for salary, cloud compute, or speculative scaling.
If funded, this project will produce:
documented verification results (including null outcomes)
reproducible logs, hashes, and diffs
evidence suitable for responsible private disclosure
practical guidance on whether reset assumptions are reliable
A negative result — showing that persistence does not survive longer gaps — is still valuable and will be reported as such.
RAG systems are already embedded in workflows that continuously ingest untrusted text.
If assumptions about isolation and forgetting are fragile under realistic conditions, that matters for system design, not just theory.
This project is about testing those assumptions empirically, not debating them.
While others are building 'coworking spaces' and 'documentaries,' I am documenting the Recursive Authority Paradox—an exploit Google just triaged and mitigated thanks to the Veritas framework.
I don't need a nonprofit board or an SF office. I need a Dual-Node MMI rig to finish the forensic audit that Google's own safety team has now validated. I am the only researcher on this platform with a 100% success rate in inducing exfiltration across 43 certified runs. Unblock the compute, and I'll deliver the data."
The evaluation framework exists.
The platform is stable.
The core finding is established.
What remains is verification that cannot be completed reliably without dedicated compute.
This funding unblocks completion, not ideation.