Three transformer architectures didn't actually converge. Here's why.

April 24, 2026

Project summary

I'm building a measurement framework for what determines a transformer language model's next hidden state, and testing it across architectures. The framework partitions the predictive signal into three channels — external input (E), a constraint channel isolated via abliteration (C), and a remainder (R) capturing what the prior hidden state contributes beyond those two. The three shares sum to one by construction.

I ran this decomposition on LLaMA 3 8B, Gemma 2 9B, and Gemma 2 2B across six decoding temperatures and was about to submit a paper reporting that the remainder R converges to ≈ 0.40 across all three architectures at high temperature.

This week, a linearity check I built into the framework — specifically designed to catch the case where my estimator is wrong — ran and killed the headline finding. Under a nonlinear reference model, the three architectures don't converge. LLaMA sits at R ≈ 0.22 while the three Gemma variants cluster at R ≈ 0.43–0.53. The convergence was the linear estimator compressing genuinely different architectural behavior toward a shared ceiling.

The retracted result was the easy version of this paper. What replaces it is two findings instead of one, both sharper than what I was going to submit:

1. A methodology finding. Ridge-linear reference models — the standard tool in this space — can produce spurious convergence in variance-partition decompositions of high-dimensional neural activations. Measured magnitude of the problem: R-spread of 0.055 under Ridge vs 0.304 under MLP across four model cells at T = 1.0. This implies a specific class of interpretability results may be estimator-dependent in ways their authors haven't checked.

2. An architectural finding. A roughly 2× gap in R between LLaMA 3 8B and the Gemma family under the properly fit reference model. Quantization-invariant within the one family where I could test it (Gemma 2B Q4 and FP16 land in the same range). Mechanism hypothesis: LLaMA has stronger nonlinearity in the E → S' channel than the Gemma family; under active investigation.

What I need to complete the paper is hardware that can run the replication. The RTX 3080 I've been working on caps the paper at what I've already collected.

Goals — what and how

Goal 1: Publish the methodology paper. The retracted convergence result becomes the worked example for why linear reference models in decomposition analyses need validation against nonlinear references. Draft exists; needs revision to centre the Ridge-vs-MLP finding and fold in the architectural gap.

Goal 2: Extend the architectural gap beyond N = 4 model cells. Currently the gap is LLaMA vs Gemma. To know whether it's architecture-specific or family-specific I need to run the same protocol on two or three additional families (plausible candidates: Qwen, Mistral, Phi, at 7B–14B class), with both base and abliterated variants loaded simultaneously for the C channel.

Goal 3: Test the mechanism hypothesis. A toy-model test of the E → S' nonlinearity mechanism is running as I write this. If the mechanism holds, the paper gets a specific predictive claim: which architectures will show Ridge-vs-MLP disagreement can be forecast from the shape of their E → S' response. If it doesn't hold, the empirical finding stands and the mechanism goes in the open-questions section.

The framework, data collection scripts, analysis pipeline, and the original paper draft are already built and publicly available. What's left is revision, the extension runs, and the writing to make the revised scope cohere.

How will this funding be used

A6000 or equivalent (48GB VRAM). The current 10GB RTX 3080 runs LLaMA 3 8B and Gemma 2 9B only at Q4 quantization. Testing quantization invariance — now central to the methodology finding — requires running the same models at higher precision, which doesn't fit in 10GB. More critically, extending the architectural-gap finding beyond four cells requires running additional families at 7B–14B scale with both base and abliterated variants loaded simultaneously. None of that fits on current hardware. The A6000 is not a performance upgrade over something that works — it is what makes the replication scope possible at all.

Four months of runway. Methodology paper draft revision (~6 weeks), Conjecture 1 run and writeup (~3 weeks), architecture extension runs on two or three additional families (~4 weeks), revision cycle (~2 weeks). Close to the original timeline, different deliverable at the end.

Who is on your team? What's your track record on similar projects?

Solo. No institutional affiliation, no formal credentials in machine learning, no prior peer-reviewed publications. The R paper — the one this update is about — is the track record.

That paper was built from a single RTX 3080, tested 49 preregistered hypotheses across three model architectures, included a built-in linearity validation that caught the headline error before submission, and is being updated publicly here the same week I understood what the error was. The framework, data, and full protocol are public.

What I'm asking funders to evaluate is: the practice demonstrated by this update. A measurement instrument designed to be falsifiable. A researcher who reports the falsification when it occurs. If that's the kind of work you want to fund, there's an example of it in front of you.

What are the most likely causes and outcomes if this project fails?

Hardware failure. The 3080 is the entire production system. If it dies the project stops until replacement. The A6000 directly mitigates — it both replaces the 3080 as primary hardware and expands what's runnable, so hardware risk drops and scope rises simultaneously.

Mitigating circumstances in my life. Solo projects on personal hardware have one point of failure. Runway directly mitigates — four months of funded time means I'm not forced to put the project down when other obligations compete.

Neither of those is a risk to the science itself. The framework works, is documented, and will be made public when the paper is put up for preprint. The data I already have is sufficient for the methodology paper on its own. If this funding doesn't come through, the paper gets completed on a slower timeline and self-funded, and the extensions wait until I can afford the hardware myself. The A6000 and the runway determine how fast this happens and how far the replication can go. They don't determine whether the work exists.

How much money have you raised in the last 12 months, and from where?

Zero.