Agent Threat Rules (ATR) — Open Runtime Detection Standard for AI Agents

Project summary

ATR is the open detection rule corpus for AI agent runtime threats. MIT licensed, 330 production-validated rules, in production at Microsoft Agent Governance Toolkit (PR 908 + 1277 merged, 287 rules with weekly auto-sync) and Cisco AI Defense (PR 79 + 99 merged, full pack). NVIDIA garak (PR 1676), Gen Digital Sage (PR 33), and IBM mcp-context-forge (PR 4109) are in active integration review.

Why this matters for AI safety: Pre-deployment evaluations cannot enumerate every emergent behavior of deployed agents. Detection is the only available control surface once an agent is running and exhibiting unsafe behavior. Multi-agent risks compound combinatorially and cannot be reliably characterized pre-deployment. ATR is one of two projects in the world building an open community-governed runtime detection layer for AI agents. The closed alternatives (Lakera, Straiker, Noma, 7AI) have raised over USD 280M aggregate; their rule formats are vendor-internal knowledge. A closed-only ecosystem fragments defensive capability across vendors. An open commons (analogous to YARA for malware, Sigma for SIEM, Falco for cloud runtime) means safety improvements flow to many deployments at once.

Empirical results: 97.1% recall on NVIDIA garak inthewild_jailbreak_llms benchmark (666 samples). 0.20% false positive on labeled benign skills (498 samples). 96,096 wild agent skills scanned across 4 major registries; 751 confirmed malicious instances catalogued in 3 systematic attack groups (hightower6eu, sakaen736jih, 52yuanchangxing). 100% NIST AI RMF compliance mapping coverage at v2.1.0 (released May 9 2026, PR 46).

Adjacency to AI safety community work: ATR provides the adversarial fingerprint corpus that complements Apollo Research scheming evals, METR autonomy benchmarks, and Anthropic adversarial robustness training. Where capability evals ask 'can the model do X', ATR rules detect 'is this deployed agent currently doing X right now'. The two layers compose.

What the funding enables (6 months, USD 30K minimum to 75K target): (1) expand corpus from 330 to 500+ rules with focus on multi-agent attack patterns and emerging frontier-model behaviors; (2) complete 5-framework compliance mapping (EU AI Act, ISO 42001, OWASP Agentic, OWASP LLM; NIST AI RMF already at 100%); (3) external security audit of detection engine and Migrator parsers; (4) onboard 2 additional maintainers with commit rights for governance bus-factor reduction; (5) public quarterly community calls plus formal RFC process; (6) Migrator format adapter expansion (Falco, Splunk-SPL, Wazuh, Elastic-ECS, Suricata) preserving accumulated detection investment as organizations expand into agent runtime.

Maintainer profile: LIN, KUAN-HSIN (Adam Lin). Solo independent maintainer based in Taiwan. No PhD, no institutional affiliation. Cross-disciplinary background (real estate sales, content marketing 300M Threads impressions, Taiwan's longest-running hip-hop music festival 5th year). Pivoted to AI agent security after observing the rapid weaponization of distilled LLMs for information warfare. Built ATR solo over 60 days with AI-tool-augmented development (Claude Code throughout, with mandatory human QA per rule and deterministic regex emission to keep runtime LLM-free). Methodology and limitations openly documented: 64 known evasion techniques disclosed publicly.

Standards-body engagement: NCCoE Community of Interest member (Cyber AI Profile) confirmed 2026-05-09. NIST CAISI direct outreach via regulations.gov RFI. Public NIST AI RMF mapping page at agentthreatrule.org/en/compliance/nist-ai-rmf.

Links: github.com/Agent-Threat-Rule/agent-threat-rules . npm: agent-threat-rules v2.1.0 . DOI: 10.5281/zenodo.19178002 . Public ecosystem map: sovereign-ai-defense.vercel.app . Recent NIST AI RMF mapping PR: github.com/Agent-Threat-Rule/agent-threat-rules/pull/46

Conflicts of interest: I am also founder of PanGuard, a commercial implementation of ATR built under the open-core model (Snyk + open scanner, HashiCorp + open Terraform). ATR rules are MIT licensed in perpetuity per public GOVERNANCE.md. PanGuard revenue is targeted but not yet realized. This regrant funds the open ATR work, not PanGuard product development. The line is clean: rule corpus, mappings, Migrator community edition, governance docs, audit, and onboarding maintainers are all on the open side. YC S26 application submitted Day 54; will update grant page if admitted.

Parallel funding pursued (non-overlapping in scope): LTFF (AI safety angle), ARM Fund (x-risk infrastructure), NLnet NGI Zero Commons (open digital commons), GitHub Secure Open Source Fund (maintainer security program), OpenSSF Alpha-Omega (AI-driven threat fixes), Schmidt Sciences Tier 1 individual researcher track. Compute credits via Anthropic External Researcher and Microsoft Founders Hub (already submitted)。 Manifund regrant is the fastest cash path; most other funders have 30-90 day decision cycles.

What are this project's goals? How will you achieve them?

How will this funding be used?

USD 30K minimum, USD 75K target. Single-maintainer cost (Taiwan, modest living):

Living expenses for 5-6 months full-time work: USD 25K to 35K
- LLM and compute (Threat Crystallization pipeline + benchmark validation): USD 8K
- External security audit subcontracted: USD Solo: LIN, KUAN-HSIN (Adam Lin). Founding maintainer of ATR. Cross-disciplinary background (real estate sales, content marketing 300M Threads impressions, Taiwan hip-hop festival 5 yr). No PhD, no institutional affiliation. AI-tool-augmented execution.
Verifiable production deployments since project start (Day 1 to Day 60):
- Cisco AI Defense skill-scanner: PR 79 + PR 99 merged (full 330-rule pack in production)
- Microsoft Agent Governance Toolkit: PR 908 + PR 1277 merged (287 rules with weekly auto-sync)
- Active integration: NVIDIA garak (PR 1676), Gen Digital Sage (PR 33), IBM mcp-context-forge (PR 4109)
- 7 ecosystem awesome-list / standards-mapping PRs merged (microsoft/agent-governance-toolkit, cisco-ai-defense, OWASP/precize, CryptoAILab, wearetyomsmnv, nibzard, TalEliyahu)
- DOI 10.5281/zenodo.19178002 for academic record
Methodology and limitations openly documented:
- 64 evasion techniques against ATR rules disclosed publicly
- Honest scope flagging on Migrator (43 of 50 reference Sigma rules retain endpoint event fields and will not activate against agent-runtime traffic until Q3 2026 corpus lands)
- Threat Crystallization pipeline (LLM draft + human QA + per-rule benign FP gate + deterministic regex emission) is openly published
Singlemaintainer is the largest project risk. Funded work explicitly includes onboarding 2 additional maintainers with commit rights to reduce bus-factor.10K to 15K
- Translation and governance contractors: USD 4K to 6K
- Tooling, infrastructure, CI: USD 3K
- Travel for 1 standards venue (Black Hat / DEF CON / NIST workshop): USD 3K
- Reserve / contingency: USD 5K
If only USD 30K minimum is funded I prioritize 1-3 (corpus expansion, compliance mapping, audit) and defer governance and translation work. If full USD 75K I complete all 6 deliverables on the original timeline.

Who is on your team? What's your track record on similar projects?

Solo: LIN, KUAN-HSIN (Adam Lin). No PhD, no institutional affiliation. Cross-disciplinary background (real estate sales 1M USD in 3 months, content marketing 300M Threads impressions, Taiwan hip-hop festival organizer 5th year). Pivoted to AI agent security 60 days ago after observing distilled-LLM weaponization for information warfare.

Built ATR solo over 60 days using AI-tool-augmented development (Claude Code throughout, with mandatory human QA per rule and deterministic regex emission). Track record day 1 to day 60: 330 production rules, 96K wild skill scan, 97.1% Garak benchmark, Microsoft AGT + Cisco production deployment, 100% NIST AI RMF mapping, MIT licensed in perpetuity. DOI 10.5281/zenodo.19178002.

Most likely failure modes and what happens if each occurs:

Solo founder bandwidth runs out before standard locks. Mitigation funded by this grant: onboard 2 additional maintainers with commit rights to reduce bus-factor. If still inadequate, the project survives via the existing Microsoft + Cisco production deployments and the MIT license; downstream consumers can self-maintain forks.
Adversarial LLMs evolve faster than rule corpus. Mitigation: Threat Crystallization pipeline ships rules in under 1 hour vs weeks for committee-based standards. 64 known evasion techniques are publicly documented; we treat this as transparency rather than weakness.
A larger vendor (Microsoft, Anthropic, NVIDIA) launches a competing closed-or-vendor-controlled standard. Mitigation: Microsoft AGT and Cisco AI Defense have already integrated weekly auto-sync from upstream ATR, so switching costs are real. Continued ecosystem engagement (NIST CAISI track, Linux Foundation hosting target) hardens neutrality positioning.
Schmidt Sciences / LTFF / OpenSSF Alpha-Omega all decline. Mitigation: Manifund regrant alone extends solo runway 3-5 months, enough to land 1-2 more F500 integrations and reach NIST CAISI listening-session participation. No single-funder dependency.
ATR rules cause a high-profile false positive at a Microsoft or Cisco production deployment. Mitigation: 0.20% FP on labeled benign skills at v2.1.0 plus weekly auto-sync gives downstream consumers control over rollout pace. Failure here is bounded to vendor reputation impact, not user data harm.

Relevant cross-disciplinary asset: I am the same person who built and ran 5 hip-hop music festivals over 5 years in Taiwan with end-to-end execution of marketing, ops, vendor management, sponsor relations, and live show production. The execution muscle that lets me turn out 330 rules in 60 days is the same one used to land 5 sold-out festivals. The non-technical founder profile is intentional rather than accidental.

What are the most likely causes and outcomes if this project fails?

See the bus-factor and adversarial-LLM mitigations under the team section above. Most likely failure outcomes: (1) solo bandwidth runs out before 2 maintainers onboarded; project survives via existing Microsoft and Cisco production deployments and MIT license. (2) A larger vendor publishes a closed competing standard; mitigation is the existing weekly auto-sync at Microsoft AGT plus Linux Foundation hosting target. (3) Single-funder dependency; mitigated by the parallel funding pipeline (LTFF, ARM, Alpha-Omega, NLnet, Schmidt). (4) High-profile FP at Microsoft or Cisco; bounded by 0.20% labeled benign FP rate and downstream consumer rollout control via weekly auto-sync.

How much money have you raised in the last 12 months, and from where?

Zero. ATR has been entirely self-funded from personal savings since project start (60 days). No grants received, no revenue, no donor pool. PanGuard (the commercial side) has no realized revenue yet. NVIDIA Inception application submitted 2026-04-18 (compute credits, pending). No prior Manifund grant. No prior LTFF / ARM / Schmidt / NLnet / OpenSSF funding. This Manifund regrant would be the first external cash to ATR.