Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
mcarthey avatarmcarthey avatar
Mark McArthey

@mcarthey

Independent researcher (published preprint; ~10 months continuous deployment) studying how memory-augmented AI systems fail at truthfulness and emotional honesty.

https://learnedgeek.com
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

About Me

I’m an independent researcher and senior software engineer. By day I build production .NET systems and teach computer science as an adjunct instructor; on my own time I design and run ANI — a memory-augmented AI system I’ve operated continuously for about 10 months as a single-subject research probe.

What makes the work unusual is the instrument itself. Most evaluation of AI truthfulness and emotional behavior happens on static prompts or short sessions. ANI runs continuously, holds persistent emotional state and long-term memory, and logs everything — which lets me observe failure modes that only emerge over months of accumulated history. From that deployment I’ve documented:

• a seven-type confabulation taxonomy drawn from production, including failures traced to schema design rather than to the model;

• retrieval contamination — high-relevance memories losing to low-relevance summaries — as a first-class, recurring failure mode, with a three-layer mitigation now deployed;

• an architectural restraint pattern in which model output is constrained independently of the model (“the model proposes, the architecture disposes”), making undesired behavior architecturally impossible rather than merely improbable;

• emergent display rules — expressed emotion diverging from internal state in a structured way — which run structurally opposite to the sycophantic-mirroring pattern documented in the published companion-AI literature (Chu et al., 2025). I read this as directly relevant to socioaffective alignment (Kirk et al., 2025): the question of how to build affective systems that don’t simply mirror and flatter the user.

I’ve published one research preprint (DOI 10.5281/zenodo.19342190), with a second paper near completion.

I’ll be candid about where I sit: I come from software engineering and human-computer interaction, and I’m relatively new to the formal AI-safety community — I’m here because the failure modes I keep hitting in deployment are alignment problems, and I’d rather engage that conversation directly than work in isolation. The work has been entirely self-funded to date, run on personal hardware and cloud services. The binding constraint is time and runway — the hours to generalize these findings out of one system and into something the field can use.

I’m looking for funding to take this work from a single subject to a multi-subject study and to open-source the architecture and evaluation tooling, plus collaborators and reviewers who work on memory, truthfulness, or affective alignment.

[Links:] learnedgeek.com/research · DOI 10.5281/zenodo.19342190

Projects

ANI: An empirical testbed for alignment failures in a deployed AI systempending grant agreement signature