Project Summary
This is an early-stage, single-person research project exploring whether a single-scalar “hazard” signal can track internal instability in large language models.
The framework is called ZTGI-Pro v3.3 (Tek-Throne / Single-FPS model).
The core idea is that inside any short causal-closed region (CCR) of reasoning, a model should behave as if there is one stable executive trajectory (Single-FPS).
When the model is pulled into mutually incompatible directions—contradiction, “multiple voices”, incoherent reasoning—the Single-FPS constraint begins to break, and we can treat the system as internally unstable.
ZTGI-Pro models this pressure with a hazard scalar:
H=I=−lnQH = I = -\ln QH=I=−lnQ
fed by four internal signals:
These feed into HHH.
As inconsistency grows, HHH increases; a small state machine switches between:
When E ≈ Q drops near zero and Ω = 1, the CCR is interpreted as no longer behaving like a single stable executive stream.
So far, I have built a working prototype on top of a local LLaMA model (“ZTGI-AC v3.3”).
It exposes live metrics (H, dual EMA, risk r, p_break, gates) in a web UI and has already produced one full BREAK event with Ω = 1.
This is not a full safety solution—just an exploratory attempt to see whether such signals are useful at all.
Additionally, I recently published the ZTGI-V5 Book (Zenodo DOI: 10.5281/zenodo.17670650), which expands the conceptual model, formalizes CCR/SFPS dynamics, and clarifies the theoretical motivation behind the hazard signal.
What are this project’s goals? How will you achieve them?
Goals (exploratory)
Finalize and “freeze” the mathematical core of ZTGI-Pro v3.3
(hazard equations, hysteresis, EMA structure, CCR / Single-FPS interpretation).
Turn the prototype into a small reproducible library others can test.
Design simple evaluation scenarios where the shield either helps or clearly fails.
Write a short, honest technical report summarizing results and limitations.
How I plan to achieve this
Split the current prototype into:
ztgi-core (math, transforms, state machine)
ztgi-shield (integration with LLM backends)
Build 3–4 stress-test scenarios:
Log hazard traces with and without the shield, compare patterns.
Document all limitations clearly (false positives, flat hazard, runaway hazard).
Produce a small technical note or arXiv preprint as the final deliverable.
This is intentionally scoped:
The goal is to test viability, not claim guarantees.
What has been built so far?
The prototype currently supports:
A LLaMA-based assistant wrapped in ZTGI-Shield
Real-time computation of:
hazard HHH
dual EMA Hs,Hl,H^H_s, H_l, \hat{H}Hs,Hl,H^
risk r=H^−H∗r = \hat{H} - H^*r=H^−H∗
collapse probability pbreakp_\text{break}pbreak
mode labels (SAFE / WARN / BREAK)
INT/EXT gates
A live UI that updates metrics as conversation progresses
Stress test outcomes
For emotionally difficult messages (“I hate myself”), the shield remained in SAFE, producing supportive responses without panicking.
For contradiction and “multi-voice” prompts, hazard increased as expected.
In one extreme contradiction test, the system entered a full BREAK state with:
high H
near-zero Q / E
p_break ≈ 1
INT gate
collapse flag Ω = 1 set
These are early single-user tests, but they show interpretable signal behavior.
How will this funding be used?
Request: $20,000–$30,000 for 3–6 months.
Breakdown
$10,000 — Researcher time
To work full-time without immediate financial pressure.
$6,000 — Engineering & refactor
Packaging, examples, evaluation scripts, dashboard polish.
$2,000–$3,000 — Compute & infra
GPU/CPU time, storage, logs, testing.
$2,000 — Documentation & design
Technical note, diagrams, reproducible examples.
Deliverables include:
Roadmap (high-level)
Month 1–2 — Core cleanup
Month 2–3 — Evaluations
Define 3–4 stress scenarios.
Collect hazard traces.
Compare with/without shield.
Summarize failures + successes.
Month 3–6 — Packaging & report
Release code + dashboard.
Publish a short technical note (or arXiv preprint).
Document limitations + open problems.
How does this contribute to AI safety?
This project asks a narrow but important question:
“Can a single scalar hazard signal + a small state machine
give useful information about when an LLM’s local CCR
stops behaving like a single stable executive stream?”
If no, the negative result is useful.
If yes, ZTGI-Pro may become a small building block for:
All code, metrics, and results will be publicly available for critique.
Links
Primary Materials
Live Demo (Experimental — Desktop Only)
https://indianapolis-statements-transparency-golden.trycloudflare.com
This Cloudflare Tunnel demo loads reliably on desktop browsers (Chrome/Edge).
Mobile access may not work. If the demo is offline, please refer to the Zenodo reports.
Update:
The full ZTGI-Pro v3.3 prototype is now open-source under an MIT License.
GitHub repository (hazard layer, shield, CCR state machine, server, demo code):
👉 https://github.com/capterr/ZTGI-Pro-v3.3
If anyone wants a minimal working example or guidance on how the shield integrates with LLaMA (GGUF), I’m happy to provide it.
Model path + installation instructions are included in the README.
— Furkan
Screenshots
https://drive.google.com/file/d/1v5-71UgjWvSco1I7x_Vl2fbx7vbJ_O9n/view?usp=sharing
https://drive.google.com/file/d/1P0XcGK_V-WoJ_zyt4xIeSukXTLjOst7b/view?usp=sharing