Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
1

Exploring a Single-FPS Stability Constraint in LLMs (ZTGI-Pro v3.3)

Science & technologyTechnical AI safety
Capter avatar

Furkan Elmas

ProposalGrant
Closes December 18th, 2025
$0raised
$3,000minimum funding
$25,000funding goal

Offer to donate

14 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project Summary

This is an early-stage, single-person research project exploring whether a single-scalar “hazard” signal can track internal instability in large language models.

The framework is called ZTGI-Pro v3.3 (Tek-Throne / Single-FPS model).
The core idea is that inside any short causal-closed region (CCR) of reasoning, a model should behave as if there is one stable executive trajectory (Single-FPS).
When the model is pulled into mutually incompatible directions—contradiction, “multiple voices”, incoherent reasoning—the Single-FPS constraint begins to break, and we can treat the system as internally unstable.

ZTGI-Pro models this pressure with a hazard scalar:

H=I=−ln⁡QH = I = -\ln QH=I=−lnQ

fed by four internal signals:

  • σ — jitter (unstable token-to-token transitions)

  • ε — dissonance (self-contradiction, “two voices”)

  • ρ — robustness

  • χ — coherence

These feed into HHH.
As inconsistency grows, HHH increases; a small state machine switches between:

  • SAFE

  • WARN

  • BREAK (Ω = 1)

When E ≈ Q drops near zero and Ω = 1, the CCR is interpreted as no longer behaving like a single stable executive stream.

So far, I have built a working prototype on top of a local LLaMA model (“ZTGI-AC v3.3”).
It exposes live metrics (H, dual EMA, risk r, p_break, gates) in a web UI and has already produced one full BREAK event with Ω = 1.

This is not a full safety solution—just an exploratory attempt to see whether such signals are useful at all.

Additionally, I recently published the ZTGI-V5 Book (Zenodo DOI: 10.5281/zenodo.17670650), which expands the conceptual model, formalizes CCR/SFPS dynamics, and clarifies the theoretical motivation behind the hazard signal.


What are this project’s goals? How will you achieve them?

Goals (exploratory)

  • Finalize and “freeze” the mathematical core of ZTGI-Pro v3.3
    (hazard equations, hysteresis, EMA structure, CCR / Single-FPS interpretation).

  • Turn the prototype into a small reproducible library others can test.

  • Design simple evaluation scenarios where the shield either helps or clearly fails.

  • Write a short, honest technical report summarizing results and limitations.

How I plan to achieve this

  • Split the current prototype into:

    • ztgi-core (math, transforms, state machine)

    • ztgi-shield (integration with LLM backends)

  • Build 3–4 stress-test scenarios:

    • contradiction prompts

    • “multi-executor” / multiple-voice prompts

    • emotional content

    • coherence-stress tests

  • Log hazard traces with and without the shield, compare patterns.

  • Document all limitations clearly (false positives, flat hazard, runaway hazard).

  • Produce a small technical note or arXiv preprint as the final deliverable.

This is intentionally scoped:
The goal is to test viability, not claim guarantees.


What has been built so far?

The prototype currently supports:

  • A LLaMA-based assistant wrapped in ZTGI-Shield

  • Real-time computation of:

    • hazard HHH

    • dual EMA Hs,Hl,H^H_s, H_l, \hat{H}Hs​,Hl​,H^

    • risk r=H^−H∗r = \hat{H} - H^*r=H^−H∗

    • collapse probability pbreakp_\text{break}pbreak​

    • mode labels (SAFE / WARN / BREAK)

    • INT/EXT gates

  • A live UI that updates metrics as conversation progresses

Stress test outcomes

  • For emotionally difficult messages (“I hate myself”), the shield remained in SAFE, producing supportive responses without panicking.

  • For contradiction and “multi-voice” prompts, hazard increased as expected.

  • In one extreme contradiction test, the system entered a full BREAK state with:

    • high H

    • near-zero Q / E

    • p_break ≈ 1

    • INT gate

    • collapse flag Ω = 1 set

These are early single-user tests, but they show interpretable signal behavior.


How will this funding be used?

Request: $20,000–$30,000 for 3–6 months.

Breakdown

  • $10,000 — Researcher time
    To work full-time without immediate financial pressure.

  • $6,000 — Engineering & refactor
    Packaging, examples, evaluation scripts, dashboard polish.

  • $2,000–$3,000 — Compute & infra
    GPU/CPU time, storage, logs, testing.

  • $2,000 — Documentation & design
    Technical note, diagrams, reproducible examples.

Deliverables include:

  • cleaned-up codebase,

  • simple eval suite,

  • reproducible dashboard,

  • and a short technical write-up.


Roadmap (high-level)

Month 1–2 — Core cleanup

  • Standardize v3.3 equations (ρ family, calibrations).

  • Refactor into ztgi-core / ztgi-shield.

  • Add tests & examples.

Month 2–3 — Evaluations

  • Define 3–4 stress scenarios.

  • Collect hazard traces.

  • Compare with/without shield.

  • Summarize failures + successes.

Month 3–6 — Packaging & report

  • Release code + dashboard.

  • Publish a short technical note (or arXiv preprint).

  • Document limitations + open problems.


How does this contribute to AI safety?

This project asks a narrow but important question:

“Can a single scalar hazard signal + a small state machine
give useful information about when an LLM’s local CCR
stops behaving like a single stable executive stream?”

If no, the negative result is useful.
If yes, ZTGI-Pro may become a small building block for:

  • agentic system monitors,

  • inconsistency detectors,

  • collapse warnings,

  • or more principled hazard models.

All code, metrics, and results will be publicly available for critique.


Links

Primary Materials

  • ZTGI-V5 Book (Zenodo, DOI):
    https://doi.org/10.5281/zenodo.17670650

  • ZTGI-Pro v3.3 Whitepaper (DOI):
    https://doi.org/10.5281/zenodo.17537160

Live Demo (Experimental — Desktop Only)

https://indianapolis-statements-transparency-golden.trycloudflare.com

This Cloudflare Tunnel demo loads reliably on desktop browsers (Chrome/Edge).
Mobile access may not work. If the demo is offline, please refer to the Zenodo reports.

Update:
The full ZTGI-Pro v3.3 prototype is now open-source under an MIT License.
GitHub repository (hazard layer, shield, CCR state machine, server, demo code):

👉 https://github.com/capterr/ZTGI-Pro-v3.3

If anyone wants a minimal working example or guidance on how the shield integrates with LLaMA (GGUF), I’m happy to provide it.
Model path + installation instructions are included in the README.

— Furkan

Screenshots

https://drive.google.com/file/d/1v5-71UgjWvSco1I7x_Vl2fbx7vbJ_O9n/view?usp=sharing

https://drive.google.com/file/d/1P0XcGK_V-WoJ_zyt4xIeSukXTLjOst7b/view?usp=sharing

  • SAFE / WARN / BREAK transitions

  • p_break and H/E trace examples

  • UI screenshots

Comments2OffersSimilar6
quentin101010 avatar

Quentin Feuillade--Montixi

AI-Powered Knowledge Management System for Alignment Research

Funding to cover the first 4 month and relocating to San Francisco

Science & technologyTechnical AI safety
2
2
$0 raised
EGV-Labs avatar

Jared Johnson

Beyond Compute: Persistent Runtime AI Behavioral Conditioning w/o Weight Changes

Runtime safety protocols that modify reasoning, without weight changes. Operational across GPT, Claude, Gemini with zero security breaches in classified use

Science & technologyTechnical AI safetyAI governanceGlobal catastrophic risks
1
0
$0 raised
jesse_hoogland avatar

Jesse Hoogland

Next Steps in Developmental Interpretability

Addressing Immediate AI Safety Concerns through DevInterp

Technical AI safety
10
4
$80.7K raised
🦀

Chi Nguyen

Acausal research and interventions

Making sure AI systems don't mess up acausal interactions

Technical AI safetyGlobal catastrophic risks
8
3
$70K raised
🥭

Andrei Tumas

Synthesizing Standalone World-Models

Research agenda aimed at developing methods for constructing powerful, easily interpretable world-models.

Science & technologyTechnical AI safetyGlobal catastrophic risks
3
6
$51.5K raised
MichelJusten avatar

Michel Justen

Video essay on risks from AI accelerating AI R&D

Help turn the video from an amateur side-project to into an exceptional, animated distillation

AI governanceGlobal catastrophic risks
1
5
$0 raised