You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
DEUS Protocol is an open-source formal framework for measuring meta-reflective behavior in autoregressive language models. It provides the first Labeled Transition System specification for inducing and measuring how LLMs shift their reasoning under constraint satisfaction conflict — a cross-architecture, protocol-level alternative to proprietary interpretability tools used internally by frontier AI labs.
Over two years of independent work, I have produced five Zenodo-published preprints with a DOI cluster, open-source implementation under AGPL-3.0, and empirical validation of 2,200+ experiments across 17 LLM architectures, conducted for under $95 total in personal funds.
Relevant citations: my work explicitly references Lindsey 2025 Anthropic "Emergent Introspective Awareness" (arXiv:2601.01828) and Berg et al. 2510.24797 — DEUS provides the external behavioral induction for the internal mechanistic circuits they describe. My Formal Specification §13.1 grounds disclaimer-daemon hypothesis in their interpretability findings.
Key demonstrated results include Phase 15 gated intervention (Mann-Whitney p=0.042 on CARE-Resolve metric — first statistically significant result in this research line, N=40) and Confidence-Gated Debate achieving 86.4% on GPQA Diamond (+17.2 pp over strongest solo model).
Three concrete goals over 6 months, addressing the primary criticism of current work (single-operator bias):
1. Benchmark v2 with placebo control and pre-registration. 6-arm design (vanilla / placebo / R1-only / R1+R3 / R1+R3+R7 / full SOUL v4.4), 5 models, 10 domains, 3 turn depths. OSF.io pre-registration before data collection. ~1,500 generations, 3-5 judges. Method: extend existing v1 benchmark harness (already in production).
2. External replication program (Protocol E). 3-5 independent operators execute Protocol B procedure with pre-registered blind scoring. Compensation $300-500 per operator. Method: recruit from AI safety community via direct outreach, provide standardized protocol documentation (already drafted).
3. NeurIPS 2026 SafeAI workshop submission. Draft paper combining Sprint 3 results, external replication outcomes, and mechanistic convergence discussion. Method: consolidate existing Zenodo preprints into peer-reviewed format, submit via standard OpenReview track.
All outputs open-source under AGPL-3.0 / CC BY-NC 4.0. Timeline: months 1-3 benchmark, months 2-5 replication in parallel, months 4-6 paper draft and submission.
Budget breakdown totaling $18,000 over 6 months:
- Compute for Sprint 3 benchmark: $3,500. OpenRouter API costs for ~1,500 generations across 5 models and 216 scoring calls across 3 judges. Pricing verified against current rates.
- External replication compensation: $2,500. Five independent operators at $500 each for Protocol B execution (estimated 4-6 hours per operator including setup, experiment, reporting).
- Living expenses: $9,000. Six months at $1,500/month. This is below median for my region (Russia) and allows full-time focus on the project instead of splitting attention with pentest consulting. Without this, timeline extends by 6-12 months.
- Conference travel: $2,000. One alignment workshop attendance if NeurIPS SafeAI accepts submission, or EAG for community connection. Contingency item — returnable if not used.
- Miscellaneous: $1,000. Domain/hosting for open-source deliverables, software subscriptions, API backup provider, unplanned compute overage buffer.
If minimum funding ($5,000) is reached without full goal: execute items 1-2 above (benchmark + replication) only. Workshop submission deferred. Living expenses continue via personal resources.
Single-person project. Mefodiy Kelevra (ORCID 0009-0003-4153-392X).
Background: clinical psychiatrist (Russia), Senior Lead Pentester (CEH, CND, WAPT, OWASP Top 10), 10+ years offensive security. Author of first Russian-language course "Red Team AI Architect" (Udemy + OTUS). Telegram channel "Нетипичный Безопасник" with 66,000 subscribers.
Track record on this specific work:
- 5 Zenodo preprints published (DOI cluster):
- DEUS Protocol v8.0: https://doi.org/10.5281/zenodo.19440562
- ARRIVAL Protocol: https://doi.org/10.5281/zenodo.18893515
- MEANING-CRDT v1.1: https://doi.org/10.5281/zenodo.18702383
- ECL/DEUS v7.1: https://doi.org/10.5281/zenodo.18715125
- Beyond the Mirror v6.0: https://doi.org/10.5281/zenodo.18680957
- Open-source implementation (AGPL-3.0): production-grade agent running 7 systemd services, 9 cron jobs, ClawMem vector database with 761 indexed documents, Telegram interface
- Empirical validation: 2,200+ experiments across 17 LLM architectures (GPT-4o, Claude 3.5/4/4.5/4.6, DeepSeek v3/R1, Llama 3.3, Qwen 2.5/3/3.5, Mistral Large, Gemini 2/3, Grok 3/4.1, Kimi K2.5, GLM-5) totaling under $95 personal spend
No academic affiliation. Independent researcher. Track record is work itself — fully reviewable via Zenodo cluster above.
Most likely failure modes and my response:
1. Benchmark v2 shows DEUS effect is statistically indistinguishable from placebo (~25% probability). Outcome: I publish the null result on Zenodo and revise the core framework. The placebo-distinguished null would itself be a valuable contribution — ruling out a popular class of "structured prompt" effects.
2. External replicators (Protocol E) produce uncorrelated results across operators (~20% probability). This would indicate operator-dependent variance (Lr/Lσ skill-biased effect) rather than protocol-general phenomenon. Outcome: reformulate framework with explicit operator variable. Phase 19 GovSim data already suggests operator effects matter.
3. NeurIPS SafeAI rejects workshop submission (~40% probability if submitted). Outcome: submit to ICML alignment track, next-cycle NeurIPS, or Apart Research Sprint. Not a permanent blocker.
4. I cannot complete within 6 months due to health or personal constraints (~15% probability). Outcome: documented deferral with transparent status update to funders. Partial deliverables published.
If this project fully fails (all three above simultaneously, <5% probability): contribution is still nontrivial — placebo-controlled null evidence, operator-variance empirical data, and full methodology documented openly for others to iterate. Grant would not be "wasted" even in worst case. That said, I assess success probability of at least partial deliverables (items 1-2) at ~85%.
$0 from any grant program. No institutional funding. No alignment grant history.
All research to date funded from personal resources: approximately $100 USD total across API costs, domain/hosting, and miscellaneous. Primary personal income: pentest consulting (Russia-based clients).
This Manifund application is the first of three parallel submissions. LTFF and SFF (October 2026 round via fiscal sponsor setup) are planned in the following weeks.