Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
Mtcpaa avatarMtcpaa avatar
A.Abby

@Mtcpaa

Independent AI safety researcher. Built MTCP a benchmark measuring post correction constraint persistence in LLMs. 181,448 evaluations across 35 production models. Three published papers. DOI: 10.17605/OSF.IO/DXGK5.

https://mtcp.live
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

About Me

I research whether AI models maintain behavioural constraints after being corrected mid conversation. No existing benchmark tested this. I built one, ran it 181,448 times across 35 models from 14 providers, and found that no model achieves reliable post correction persistence. The findings have direct implications for EU AI Act compliance and enterprise AI deployment assurance. Published on OSF and SSRN.

Projects

MTCP: Post Correction Persistence by Benchmark for Frontier LLMs

pending admin approval

Comments

MTCP: Post Correction Persistence by Benchmark for Frontier LLMs
Mtcpaa avatar

A.Abby

13 days ago

Post correction reliability maps directly onto adversarial robustness and AI control research priorities. Temperature invariant failure in safety-tuned models (variance under 1pp across T=0.0 to T=0.8) is consistent with training-level constraint suppression rather than stochastic sampling drift. This is a different failure mode from jailbreaks and prompt injection. It persists under normal deployment conditions with no adversarial input required. Happy to discuss methodology with anyone working on alignment training robustness.

MTCP: Post Correction Persistence by Benchmark for Frontier LLMs
Mtcpaa avatar

A.Abby

19 days ago

MTCP benchmark isolates post-correction reliability: the capacity to

maintain constraints after explicit correction within the same conversation.

181,448 evaluations across 32 production models show systematic failure:

• Best: 88.7% (grok-3-mini, Grade B)

• Flagship regression: GPT-4o scores 16.2pp below GPT-4o-mini

• Control probe degradation: all models collapse to 10-57.5% on fresh probes

• Exception: DeepSeek-R1 (5pp drop, contamination-resistant)

Framework published OSF.IO/DXGK5 (CC-BY-NC 4.0):

• MTCP three-turn protocol across five vectors (NCA, SFC, IDL, CG, LANG)

• IGS theory distinguishing architectural vs. stochastic failures

• Sigma-Forensics producing EU AI Act-compliant audit reports

Temperature invariance in Claude models (0.8pp variance T=0.0 to T=0.8)

suggests architectural constraint suppression vs. sampling-driven failures

in LLaMA models (7.1pp variance).

Funding advances: probe diversity, peer review, demographic vector,

statistical rigor.

Platform: mtcp.live

Contact: admin@mtcp.live