Exploring a Single-FPS Stability Constraint in LLMs (ZTGI-Pro v3.3)

Project Summary

This is an early-stage, single-person research project exploring whether a single-scalar “hazard” signal can track internal instability in large language models.

The framework is called ZTGI-Pro v3.3 (Tek-Throne / Single-FPS model).
The core idea is that inside any short causal-closed region (CCR) of reasoning, a model should behave as if there is one stable executive trajectory (Single-FPS).
When the model is pulled into mutually incompatible directions—contradiction, “multiple voices”, incoherent reasoning—the Single-FPS constraint begins to break, and we can treat the system as internally unstable.

ZTGI-Pro models this pressure with a hazard scalar:

H=I=−ln⁡QH = I = -\ln QH=I=−lnQ

fed by four internal signals:

σ — jitter (unstable token-to-token transitions)
ε — dissonance (self-contradiction, “two voices”)
ρ — robustness
χ — coherence

These feed into HHH.
As inconsistency grows, HHH increases; a small state machine switches between:

SAFE
WARN
BREAK (Ω = 1)

When E ≈ Q drops near zero and Ω = 1, the CCR is interpreted as no longer behaving like a single stable executive stream.

So far, I have built a working prototype on top of a local LLaMA model (“ZTGI-AC v3.3”).
It exposes live metrics (H, dual EMA, risk r, p_break, gates) in a web UI and has already produced one full BREAK event with Ω = 1.

This is not a full safety solution—just an exploratory attempt to see whether such signals are useful at all.

Additionally, I recently published the ZTGI-V5 Book (Zenodo DOI: 10.5281/zenodo.17670650), which expands the conceptual model, formalizes CCR/SFPS dynamics, and clarifies the theoretical motivation behind the hazard signal.

What are this project’s goals? How will you achieve them?

Goals (exploratory)

Finalize and “freeze” the mathematical core of ZTGI-Pro v3.3
(hazard equations, hysteresis, EMA structure, CCR / Single-FPS interpretation).
Turn the prototype into a small reproducible library others can test.
Design simple evaluation scenarios where the shield either helps or clearly fails.
Write a short, honest technical report summarizing results and limitations.

How I plan to achieve this

Split the current prototype into:
- ztgi-core (math, transforms, state machine)
- ztgi-shield (integration with LLM backends)
Build 3–4 stress-test scenarios:
- contradiction prompts
- “multi-executor” / multiple-voice prompts
- emotional content
- coherence-stress tests
Log hazard traces with and without the shield, compare patterns.
Document all limitations clearly (false positives, flat hazard, runaway hazard).
Produce a small technical note or arXiv preprint as the final deliverable.

This is intentionally scoped:
The goal is to test viability, not claim guarantees.

What has been built so far?

The prototype currently supports:

A LLaMA-based assistant wrapped in ZTGI-Shield
Real-time computation of:
- hazard HHH
- dual EMA Hs,Hl,H^H_s, H_l, \hat{H}Hs,Hl,H^
- risk r=H^−H∗r = \hat{H} - H^*r=H^−H∗
- collapse probability pbreakp_\text{break}pbreak
- mode labels (SAFE / WARN / BREAK)
- INT/EXT gates
A live UI that updates metrics as conversation progresses

Stress test outcomes

For emotionally difficult messages (“I hate myself”), the shield remained in SAFE, producing supportive responses without panicking.
For contradiction and “multi-voice” prompts, hazard increased as expected.
In one extreme contradiction test, the system entered a full BREAK state with:
- high H
- near-zero Q / E
- p_break ≈ 1
- INT gate
- collapse flag Ω = 1 set

These are early single-user tests, but they show interpretable signal behavior.

How will this funding be used?

Request: $20,000–$30,000 for 3–6 months.

Breakdown

$10,000 — Researcher time
To work full-time without immediate financial pressure.
$6,000 — Engineering & refactor
Packaging, examples, evaluation scripts, dashboard polish.
$2,000–$3,000 — Compute & infra
GPU/CPU time, storage, logs, testing.
$2,000 — Documentation & design
Technical note, diagrams, reproducible examples.

Deliverables include:

cleaned-up codebase,
simple eval suite,
reproducible dashboard,
and a short technical write-up.

Roadmap (high-level)

Month 1–2 — Core cleanup

Standardize v3.3 equations (ρ family, calibrations).
Refactor into ztgi-core / ztgi-shield.
Add tests & examples.

Month 2–3 — Evaluations

Define 3–4 stress scenarios.
Collect hazard traces.
Compare with/without shield.
Summarize failures + successes.

Month 3–6 — Packaging & report

Release code + dashboard.
Publish a short technical note (or arXiv preprint).
Document limitations + open problems.

How does this contribute to AI safety?

This project asks a narrow but important question:

“Can a single scalar hazard signal + a small state machine
give useful information about when an LLM’s local CCR
stops behaving like a single stable executive stream?”

If no, the negative result is useful.
If yes, ZTGI-Pro may become a small building block for:

agentic system monitors,
inconsistency detectors,
collapse warnings,
or more principled hazard models.

All code, metrics, and results will be publicly available for critique.