You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
I'm building an open-source, real-time hallucination suppression system for locally-run LLMs. The system monitors token-level entropy of the model's output distribution during generation and dynamically adjusts sampling parameters to suppress hallucinations using closed-loop feedback control rather than open-loop rule switching.
The approach is grounded in established work: entropy-based uncertainty estimation detects confabulations in LLMs (Farquhar et al., 2024, Nature), and token-level entropy correlates with hallucination probability (Huang et al., 2025, ACM TOIS). Entropix (xjdr, 2024) demonstrated that entropy and varentropy are actionable signals during generation, using them to switch between discrete sampling strategies. This project closes the loop: continuous feedback control replaces open-loop decisions, and a dynamic setpoint replaces fixed thresholds. Entropix never published large-scale benchmark evals; this project is building that validation from the ground up.
The controller uses a 4th-order state-space formulation tracking entropy error, its integral, velocity, and acceleration. The acceleration term is the key contribution — it catches the characteristic upward curvature that precedes a hallucination spike, enabling intervention before it peaks. Velocity-form actuation means the controller accumulates corrections over time rather than recomputing from scratch at each token, giving it persistent memory of the generation trajectory.
The project has completed a full 5000-problem validation sweep on the MATH benchmark (Qwen 3.5 2B Q4_K_M, llama.cpp with CUDA), with a clear mechanistic picture of what is working and what the next bottleneck is.
Current results: 60.9% → 63.0% accuracy (+2.1pp), ~7% token reduction. The failure mode analysis is the more important finding:
BaselineHybrid controllerΔCap hit rate28.2%25.3%−2.9ppUnder-cap accuracy82.7%82.3%−0.4ppCap-hit accuracy5.4%6.2%+0.8ppOverall accuracy60.9%63.0%+2.1pp
Wrong answers are dominated by a single failure mode: the model exhausts the token budget without converging (spinning). Cap hits have ~5% accuracy essentially zero. The controller's entire gain comes from reducing cap hits. Under-cap accuracy is essentially unchanged, confirming the controller is not interfering with problems it has no signal for.
Mean entropy cleanly predicts outcome across all groups (capped wrong: H=0.344, uncapped correct: H=0.225), validating the sensor. The bottleneck is the actuators: sampling parameters (min_p, top_p, repeat penalty) lack the authority to break a spin once it begins.
The next phase addresses this directly. The experiment queue, in order:
Single-actuator ablation, identify which actuators help, which hurt, which are neutral
Token position as sensor, gives the controller awareness of remaining budget
TECA (Token Entropy Cumulative Average, Bin et al. 2025) cumulative entropy deviation from an expected decay curve, detects failure to transition from exploration to determination
Decoupled temperature to measure entropy at fixed temperature for a clean sensor signal, sample at dynamic temperature as an actuator, breaking the feedback corruption that made temperature harmful in earlier experiments
Logit bias on termination tokens. Directly increase exit probability as budget pressure rises
KV cache spectral reshaping — SVD-based manipulation of the Value cache, a stronger actuator that operates on internal representations rather than the output distribution
Steps 1–5 are feasible on current hardware. Step 6 requires the 3090.
The GPU upgrade also unlocks the dual-model architecture: a smaller reference model (Qwen 3.5 0.8B) running alongside the 9B, providing an adaptive entropy setpoint and enabling KL divergence between the two models' distributions as an additional sensor channel. The 9B is the model worth validating against; it requires aggressive quantization to fit on 8GB, degrading output quality and leaving no headroom for the reference model or SVD buffers.
A secondary observation channel is already implemented: QEWS (Quantum Early Warning Signal), a rolling density matrix constructed from L2-normalized logit vectors, whose von Neumann entropy tracks structural shifts in the distribution-of-distributions over time. The QEWS hybrid (equal weighting with Shannon entropy) is the current best configuration on small-scale experiments (+8pp over baseline at 100 problems).
All code and results are released open-source under MIT license.
$1,000 toward a used NVIDIA RTX 3090 (24GB VRAM), including shipping and tax, and any incidental hardware costs (PSU upgrade, cables).
Currently running on an RTX 3070 (8GB VRAM). The 2B model fits comfortably but is not the validation target. The 9B at reasonable quantization requires ~12GB VRAM; at 24GB it fits cleanly alongside a reference model, with headroom for KV cache buffers and per-layer SVD computation. The 3090 also roughly doubles memory bandwidth (936 GB/s vs 448 GB/s), directly increasing tokens per second. Faster inference means more experiments per night — the bandwidth gain multiplies research iteration speed, not just its ceiling.
Hardware only. No stipend. Everything is open-source.
Solo independent researcher, full-time on this project. I hold a Master's in Mathematics from Montana State University, where I was a graduate teaching assistant for differential equations and numerical linear algebra — the direct mathematical foundations of this work (state-space models and SVD respectively).
I have been building and operating autonomous AI agents on fully local infrastructure for the past year. Current stack: Qwen 3.5 2B on llama.cpp with CUDA, Ubuntu server, working agentic pipeline with tool use and autonomous code generation. The controller design emerged from observing this system's real failure modes and recognizing that classical control theory applies directly to steering LLM generation.
The full 5000-problem validation sweep and failure mode analysis have been published openly on GitHub. No prior publications. This would be my first formal research output.
Most likely failure: the actuators are the fundamental bottleneck. The failure mode analysis already shows this clearly — the sensor works, the actuators don't have enough authority. If stronger actuators (logit bias, KV cache reshaping, activation steering) also fail to break spinning, that is a meaningful negative result. It would establish that inference-time sampling control cannot solve the spinning failure mode and that the solution requires either training-time intervention or architectural changes. That finding gets published.
Second failure mode: sensor-actuator coupling. Temperature was eliminated as an actuator because it corrupts the entropy signal. Other actuators may have similar interactions. If decoupled temperature doesn't resolve this, it means the sensor and actuator spaces are too entangled for clean control, which is also a publishable finding.
Third: the dual-model architecture doesn't add value. The reference model's distribution may not provide a meaningfully better setpoint than a fixed or rolling-average target. In that case, the single-model system stands as the contribution.
In all cases, validation data and analysis are published openly. The failure mode analysis from the current sweep is already a contribution independent of whether the controller ultimately works.
$0. This is my first funding application. The project has been entirely self-funded to date, including all hardware and infrastructure.