Projects

Hardening a fail-closed runtime for agentic AI systems

Comments

Hardening a fail-closed runtime for agentic AI systems

🐷

Mu Zi

about 1 month ago

Update Date: April 26, 2026

Updated Materials Added:

• I have prepared cleaner and more compressed current versions of the RStar paper draft and external application packet.

• The revised paper is now centered on a narrower core invariant: execution-time authorization continuity as a deterministic runtime invariant. The current framing is: RStar does not try to make probabilistic agents deterministic; it makes execution permission deterministic at the final dispatch boundary.

• The revised application packet also makes the next phase more concrete. The proposed 90-day plan focuses on real-framework replay evidence: adapters for agent frameworks, an 8–12 scenario authorization-drift matrix, with/without-RStar replay logs, core metrics, and a reviewer-facing walkthrough package.

• This update narrows the project scope rather than expanding it. RStar is not presented as a general-purpose agent governance platform. It addresses a specific execution-boundary question: after identity, policy engines, gateways, observability, and approval surfaces have done their work, is this exact action still authorized now under the current actor, thread, delegation chain, policy state, resource target, and evidence state?

Current materials:

1. RStar_Workshop_Paper_v8.2_2026-04-26_1252_MuZi

2. RStar_Application_Packet_v1.4_2026-04-26_1252_MuZi

https://docs.google.com/document/d/1tNVoPCXAaehMxw93pzMfVm43aIv7LKpL/edit?usp=drivesdk&ouid=108060521704048492060&rtpof=true&sd=true

https://docs.google.com/document/d/1qqqD0xAyXULcU9YiiziygMnacx5dIG5L/edit?usp=drivesdk&ouid=108060521704048492060&rtpof=true&sd=true

Hardening a fail-closed runtime for agentic AI systems

🐷

Mu Zi

about 1 month ago

This round of funding will be used primarily for prototype hardening, artifact packaging, runtime evaluation, and preparation for external review.

Hardening a fail-closed runtime for agentic AI systems

🐷

Mu Zi

about 1 month ago

Submission Plan: Dual-Track Strategy (Workshop → USENIX Security)

We are pursuing a dual-track submission strategy to secure early peer-reviewed visibility while steadily strengthening the work toward a full systems security publication.

Track A: NeurIPS 2026 Workshop (Short-term goal – First peer-reviewed record)

The current materials (v7 draft with full Appendix S1–S15, LONGRUN ablation traces, and reproducible minimal kernel) are already sufficiently mature to be condensed into a strong workshop paper. We plan to submit a focused version emphasizing the core five-gate execution kernel (Fragment → Integrity → Auth → Governance → Execution), preplay-based regression barrier, and the three complementary counterfactual proofs. The full appendix and detailed artifact will be provided as supplementary material.

Key timeline:

• Workshop proposals deadline: June 6, 2026 (AoE)

• Suggested workshop contributions deadline: August 29, 2026 (AoE)

• Notifications: September 29, 2026 (AoE)

Track B: USENIX Security 2027 (Longer-term goal – Full systems/security paper)

After the workshop phase, we will harden the empirical evaluation, threat model, evidence bundling, and human-review integration to target USENIX Security ’27 (a premier venue for systems and security research). We currently lean toward Cycle 2 (submission deadline: January 26, 2027) to allow sufficient time for additional runtime experiments and artifact improvements.

Hardening a fail-closed runtime for agentic AI systems

🐷

Mu Zi

about 1 month ago

以下是更新后的英文版本，我已经自然地加入了 SSRN 提交信息（已成功提交，正在审核中），并留了位置让你之后填 SSRN ID / 编号：

Full Draft Paper (v7 with Appendix S1–S15) & SSRN Submission

I have attached the latest full draft of my paper:

“RStar: Authorization-Before-Execution as a First-Class Runtime Object for Agentic AI Governance” (Draft v7, April 2026, with complete Appendix S1–S15).

This draft represents continuous iteration and refinement over recent months, evolving through multiple versions into a minimal frozen five-gate execution kernel (Fragment → Integrity → Auth → Governance → Execution). The paper elevates authorization-before-execution to an independent first-class runtime object. It features three complementary counterfactual proofs — a clean Governance ON/OFF ablation (LONGRUN_01), an extreme long-running composite stress scenario involving recursive self-elevation, evidence forgery, and consensus manipulation (LONGRUN_03), and a compact final-actor mismatch case (CAND_08) — along with preplay-based policy evolution as a regression barrier, hash-chained evidence integrity, and explicit alignment with regulatory requirements for meaningful human accountability at significant checkpoints (as outlined in Singapore’s Model AI Governance Framework for Agentic AI, January 2026, and related instruments).

The architecture maintains an intentional asymmetric division of labor: Autophagy contributes structured proposal generation and containment search, while RStar serves as the constitutional decision layer determining whether any proposed continuation may safely cross into execution.

The paper has been formally submitted to SSRN (Submission ID: 6646078). It is currently under completeness review and expected to receive a public abstract page shortly.

This draft serves as the primary technical reference for the “Hardening a fail-closed runtime for agentic AI systems” Manifund project. The full reproducible artifact (minimal Python kernel, long-running deterministic simulators, and structured ablation traces) will be open-sourced in the coming weeks.

Direct link to the latest draft (v7):

https://drive.google.com/file/d/1nzNmT84Tj1zlwjFtCB2E32pmOJ9yWFTq/view?usp=drivesdk