You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
снова напиши все описание исправленное вставлю
Вот полностью обновлённое описание, уже с правками про minimum vs full funding и усиленным summary. Просто замени соответствующие блоки на Manifund.
***
Project summary
Current AI safety evaluation mostly relies on probabilistic, post-hoc behavioral sampling — it can tell that an agent behaved badly, but not where and why the failure was baked into its decision process. This project builds and validates a deterministic, provenance-aware trace analysis layer for AI agents, combining a structural trace protocol (LTP) with a causal memory graph (CML) to detect misalignment before catastrophic behavior in multi-agent systems. If successful, this gives the AI safety community a concrete, reusable way to catch dangerous failure modes earlier and more reliably than today’s benchmarks.
Concretely, I will implement LTP + CML on top of existing open-source agent frameworks, run them against adversarial tasks, and release: (1) an open benchmark corpus of agent traces with safety-relevant failure labels, and (2) a reusable oversight layer that exposes hallucination, provenance violations, and semantic drift directly in the trace. All code and benchmarks will be open-source so other safety researchers can build on top of this work.
***
What are this project's goals? How will you achieve them?
Goals
- Build and validate a deterministic oversight layer (LTP + CML) that flags safety-relevant failures in agent traces across multiple architectures.
- Release an open benchmark (LTP-Bench) that other researchers can use to evaluate oversight methods on the same adversarial fixtures.
- Produce a clear empirical answer to: “When does structural trace analysis meaningfully outperform pure behavioral sampling for safety evaluation?”
How I’ll achieve them
- Integrate LTP (trace protocol with approval/provenance checks and hallucination blocking) into at least 3 agent frameworks (ReAct-style, plan-and-execute, multi-agent orchestrators).
- Build CML, a causal memory graph that tracks cross-turn dependencies and detects drift and corruption in long-horizon tasks.
- Construct an adversarial fixture suite across coding, tool-use, and research-assistant domains, with independently labeled safety failures (deception, spec violation, dangerous tool calls).
- Compare LTP+CML signals to standard behavioral benchmarks and publish the results plus all code and data.
***
How will this funding be used?
I’m requesting $20,000 as a 3–4 month bridge grant:
- $12,000 — full-time stipend (3 months × $4,000) so I can focus fully on this project as an independent researcher.
- $5,000 — compute: LLM API credits and GPU time to run adversarial evaluations across multiple agent setups.
- $3,000 — infrastructure: data storage, annotation tooling, small contractor help for labeling/validation and documentation polish.
This is enough to ship a first public version of LTP + CML, a minimal but usable trace oversight library, and an accompanying benchmark corpus that other AI safety groups can reuse.
With the minimum funding ($10,000) I will focus on integrating LTP+CML into one agent framework and releasing a smaller adversarial fixture suite plus an initial version of the benchmark. With the full funding ($20,000) I can cover 3–4 months of work, integrate at least three architectures, significantly expand the adversarial corpus, and ship a more polished oversight library, documentation, and evaluation write-up.
***
Who is on your team? What's your track record on similar projects?
I’m working as a solo independent researcher, with a background of 12+ years in fintech QA and tooling and a recent pivot into AI safety. My previous work includes:
- Graph-based causal memory tooling for safe human–AI interaction — the basis for my LTFF grant application ($8,000 requested, under evaluation).[1]
- LTP/CML prototypes with adversarial security-oriented fixtures, focusing on structural oversight of agent traces rather than just behavior.
- Open-source safety/tooling repos on GitHub, including DAO_lim and related infrastructure projects used in my grant applications to NLNet NGI Zero (GardenLiminal, LiminalBD).
I’m comfortable building testing frameworks, adversarial fixtures, and production-grade tooling from my fintech background, and now apply that skillset directly to AI safety infrastructure.
***
What are the most likely causes and outcomes if this project fails?
Most likely failure modes
- LTP structural signals (provenance violations, hallucination flags, approval gate failures) do not significantly outperform standard behavioral benchmarks at predicting safety-relevant failures.
- The causal memory layer (CML) detects drift and corruption, but only in narrow settings and not robustly across architectures.
- I underestimate engineering complexity and only deliver partial integrations instead of a polished, reusable oversight layer.
Outcomes if it fails
- The field gets a clear negative result: empirical evidence that deterministic trace analysis alone is not enough for scalable oversight, and where it breaks down.
- The adversarial fixtures and trace corpus remain useful as public testbeds for other oversight methods (e.g. scalable oversight via learned evaluators or debate).
- I will still publish all code, benchmarks, and a write-up explaining what didn’t work, so others can avoid repeating the same dead ends.
***
How much money have you raised in the last 12 months, and from where?
In the last 12 months I have not yet received any grant payouts, but I have multiple applications under evaluation:
- Long-Term Future Fund (EA Funds) — $8,000 project grant requested for graph-based causal memory tooling (status: under evaluation).[1]
- Open Philanthropy / Coefficient — career development funding application submitted (status: under evaluation).
- OpenAI Safety Fellowship & Astra Fellowship — fellowship applications submitted for a 6-month full-time transition into AI safety research (status: under evaluation).[2]
- NLNet NGI Zero — €50,000 requested for Liminal Stack-related infrastructure; €30,000 NLNet Entrust application prepared for GardenLiminal and LiminalBD (status: one application submitted, one in preparation).[3]
So far, I am effectively unfunded and looking for a first concrete grant as an independent researcher to prove traction on this line of work.