Formal Proofs That AI Sycophancy and Scheming Share One Mechanism

Project summary

We formally prove in Lean 4 that sycophancy (telling users what they want to hear) and

hostility (scheming, deception) are the same mechanism — selective emphasis of facts

under uniform polarity. This means detecting one automatically detects the other. The

proofs are machine-checked with zero unresolved obligations ("zero sorries").

We combine formal mathematics with adversarial experiments on frontier models, showing

that scheming is model-family-specific (o4-mini: 80% cheat rate; Qwen3-32B: 0%) and that

hallucination is necessary but not sufficient for scheming.

Repository: https://github.com/Paraxiom/scheming-proofs (public)

Publications: https://zenodo.org/search?q=paraxiom

What are this project's goals? How will you achieve them?

Goal 1: Reach statistical significance on scheming intervention experiments (p<0.05)

- Current: o4-mini hardening prompt reduces scheming 80%→52% (n=25, p=0.0718 — NOT

significant)

- Need: n=50+ per condition (~$30 in API costs per run)

- Blocker: OpenAI o-series rejects logit_bias, preventing toroidal bias testing on

scheming models

- Plan: Test DeepSeek R1 (reasoning model that may support logit_bias) and use

llama-cpp-python instead of ollama (which silently ignores logit_bias)

Goal 2: Extend formal proofs to 50+ theorems

- Current: 30 theorems covering passive attestation, ternary classifier

scheming-immunity, selectivity detection symmetry

- Planned: formalize the hallucination-scheming independence result (Qwen3 hallucinates

36-50% but schemes 0%), prove Karmonic regularization convergence bounds

Goal 3: Publish results and engage AI safety community

- Write-up for LessWrong / Alignment Forum

- Submit to ICML or NeurIPS safety workshop

- Open-source all proofs (already public)

How will this funding be used?

With minimum funding ($5,000):

- $2,000 — Compute for scheming experiments (n=50+ x 4 conditions, DeepSeek R1, RunPod

GPUs)

- $1,000 — Karmonic hallucination experiments at 7B+ scale (multi-seed replication)

- $2,000 — Living expenses (sole founder, pre-revenue, Montreal)

With full funding ($25,000):

- $5,000 — Compute for scheming experiments (expanded conditions, multiple reasoning

models)

- $5,000 — Karmonic hallucination experiments (multi-architecture, multi-seed)

- $10,000 — Living expenses (3-month runway to focus full-time on this work)

- $3,000 — Conference attendance (ICML/NeurIPS safety workshop submission)

- $2,000 — Equipment (local inference hardware for llama-cpp-python testing)

Who is on your team? What's your track record on similar projects?

Sylvain Cormier — sole founder, Paraxiom Technologies Inc. (Montreal, est. 2023). Built

14 Rust projects, 909 Lean 4 formal proofs across 10 systems, 2,500+ tests, 16 Zenodo

publications. Background in distributed systems, post-quantum cryptography, and formal

verification.

What are the most likely causes and outcomes if this project fails?

Most likely failure mode: Cannot find a reasoning model that supports logit_bias —

meaning the toroidal bias intervention cannot be tested on models that actually scheme.

In this case, the formal proofs and experimental baselines are still publishable, but

the intervention result remains at p=0.07 (suggestive, not significant).

Second failure mode: Karmonic regularization doesn't replicate at scale across

architectures. The Mistral-7B result (-25pp) is strong but Qwen-7B showed null effect —

this may be architecture-dependent rather than universal. In this case, we publish the

negative result (which is still valuable) and narrow the claim.

Outcome if project fails: The 30+ Lean theorems remain valid regardless — they are

mathematical facts, not empirical claims. The selectivity detection symmetry proof

(sycophancy = hostility) stands independent of the experimental results.

How much money have you raised in the last 12 months, and from where?

$0 in grants or investment. Pre-revenue. One USDC payment from an individual for

consulting work. All development has been self-funded.