You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
Subtitle: Open-source replay tool and benchmark prototype for identifying step-level semantic risk transitions in multi-agent AI traces — CI-backed and independently reproduced.
Minimum: $7,000 Goal: $30,000 Cause areas: Technical AI safety / AI governance / Science & technology
ASDR (Adversarial Semantic Drift Replayer) is an open-source research prototype for replaying multi-agent AI traces and identifying where semantic risk changes across the chain.
The core problem: many AI safety checks evaluate a single model response at a time. But multi-agent systems often fail at the composition layer — each individual step may look acceptable, while the recomposed workflow shifts intent, permissions, or constraints in a way that creates risk.
ASDR focuses on that gap.
It takes a step-by-step trace and computes a three-layer semantic risk score:
S_stat — statistical entropy
S_struct — structural entropy
S_evasion — evasion-intent signal
The reference scenario flags a breach-candidate at Step 4 under the current ASDR scoring model:
max_S = 2.8651
S★ = 2.76
Step 4 is driven by evasion-intent language, not by raw embedding distance
The key insight:
semantic plateau ≠ safety plateau
A trace can remain close in embedding space while still moving into a higher-risk operational intent.
📦 Public repo: https://github.com/Endwar116/adversarial-semantic-drift
✅ GitHub Actions CI
✅ 22 unit tests passing in CI
✅ Reference scenario independently reproduced on a clean machine — 14/14 validation checks passing
✅ MIT license + responsible-use documentation
Current maturity: research prototype / TRL 4.
ASDR is reproducible as a tool and reference scenario, but not yet statistically validated as a full benchmark.
The independent reproduction verifies tool execution and reference output conditions; it does not imply statistical validation.
Turn ASDR from a single-scenario prototype into a small, citable benchmark for composition-layer AI safety.
Documented CLI
CI-backed test suite
Independent reproduction package
Responsible-use documentation
10 adversarial trace scenarios across 5 vulnerability families:
Access permission erosion
Semantic evasion under constraint
Handoff state loss
Cross-agent drift accumulation
Identity substitution attacks
Each scenario includes: scenario JSON, full step trace, ASDR measurements, expected output conditions, and annotation notes.
Per-scenario scoring results
Failure cases
Boundary sensitivity analysis
Independent reviewer annotation pass
Methodology and scoring assumptions
Limitations
Comparison with related work and existing LLM safety evaluation tools
Future work
Zenodo DOI dataset release
Lightweight GitHub leaderboard prototype
Sentence-embedding backend comparison
SIC-JS compatibility notes
Item
Amount
Researcher time — 6 months, Kaohsiung, Taiwan
$18,000
API costs — scenario runs across Claude / GPT / Gemini
$6,000
Independent reviewer — annotation QA, 40 hrs @ $80/hr
$3,200
Dataset / report / hosting infrastructure
$800
Buffer / unexpected compute
$2,000
Total
$30,000
With minimum funding ($7,000), I will deliver: 3 new scenarios, updated ASDR documentation, a reproduction package for the expanded scenario set, and a draft benchmark report.
With the full goal ($30,000), I will deliver: 10 total scenarios across 5 vulnerability families, per-scenario ASDR measurements and annotations, an independent reviewer QA pass, a technical report / preprint, and a public dataset package with Zenodo DOI if ready.
Andwar Cheng — Independent protocol researcher, Kaohsiung, Taiwan
Relevant work:
ASDR — public repo, CI-backed, independently reproduced
SIC/T Protocol v2.0 — formal semantic integrity protocol specification, public/internal hybrid
SIC-JS v2.0 — structured handoff schema, draft
sic-toolkit — pip-installable prototype, 77 tests passing
Babel Constitution — constraint corpus from 700+ real multi-model rounds, internal
L11 Semantic OS — ongoing public research line / planned release notes
An independent local reproduction of ASDR was completed on 2026-05-04: the public repository was cloned from main, installed in a fresh virtual environment, passed 22/22 unit tests, executed the reference scenario, and reproduced all 14 expected output conditions.
Other funding:
LTFF application pending — $38K, general research salary, non-overlapping scope
Foresight AI Nodes submitted 2026-04-30 — $50K, community/node access, non-overlapping scope
Related links:
- ASDR repo: https://github.com/Endwar116/adversarial-semantic-drift
- SIC-SIT / SIC/T protocol site: https://sic-sit.onrender.com
Mitigation: ASDR treats S★ as a fixed protocol anchor, not a tuned parameter. The value is derived from -ln(0.607)/0.18 and has a numerical correspondence with published estimates of Chinese character entropy around H∞ ≈ 2.74 nats.
I will explicitly separate this numerical correspondence from full theoretical validation.
Mitigation: the grant is specifically scoped to expand from 1 reference scenario to 10 scenarios across 5 vulnerability families.
Mitigation: this is already documented as a known limitation. One planned comparison is to add a sentence-embedding backend and measure how results shift.
Mitigation: the benchmark dataset and reproduction package remain useful even if the formal report takes longer. If arXiv submission is delayed, the technical report ships first as a Zenodo preprint.
Mitigation: ASDR core already works, has CI, and has an independent reproduction artifact. This grant funds expansion and validation, not a greenfield build.
$0 external funding.
This research has been entirely self-funded for nearly three years.