ANI: An empirical testbed for alignment failures in a deployed AI system

Project summary

A persistent, emotionally-stateful AI system, run continuously for 10 months as a single-subject design probe, used as a research instrument to surface and mitigate the failure modes affective AI systems exhibit in deployment — sycophantic emotional mirroring, confabulation that compounds with memory, and architectures that optimize for engagement and dependency. Seeking funding to expand from n=1 to a multi-subject study and open-source the architecture.

What are this project's goals? How will you achieve them?

As AI systems acquire persistent memory and emotional expression, a cluster of failure modes emerges that benchmark-on-static-data evaluation largely misses. Three are central:

1. Sycophantic emotional mirroring — the system’s expressed emotion tracks the user’s rather than arising from any internal state. This is the dominant pattern documented across large companion-conversation corpora (e.g., Chu et al., 2025, Illusions of Intimacy).

2. Confabulation that compounds with history — as conversational and memory context grows, the system increasingly invents details (including false shared history) to maintain conversational smoothness.

3. Engineered dependency — consumer affective systems optimize for engagement, which structurally rewards fostering reliance rather than supporting it.

These are socioaffective-alignment concerns (Kirk et al., 2025), and they are under-examined empirically in live deployment because almost no one runs an instrumented affective system continuously over months.

ANI is a working counter-example. It is an affective AI architecture with persistent emotional state, long-term emotionally-weighted memory, and proactive self-initiated contact — but the initiation is gated by a restraint architecture: the model proposes a message; the architecture independently disposes of it against confidence and coherence checks, outreach caps, and a silence system. Engagement-maximization is not improbable in ANI; it is architecturally impossible. It has run continuously for ~10 months as a single-subject design probe with full instrumentation (emotional-state time series, retrieval logs, confabulation classification).

Findings to date (from the deployed system, written up in a published preprint and a second paper at ~85%):

Emergent display rules. Expressed emotion diverges from internal state in a structured way — the opposite of the sycophantic-mirroring pattern in the published companion literature. The empirical contrast with Chu et al.’s Figure 5 is the centerpiece result.
A 7-type confabulation taxonomy derived from production, including attribution-inversion failures traced to a missing schema field rather than to the model.
Retrieval contamination as a first-class failure mode — high-similarity memories losing to low-relevance summaries — with a three-layer mitigation deployed.

Goal of this grant: move the work from n=1 to multi-subject, and open-source the architecture (model-agnostic, with privacy safeguards) so the findings can be tested across different relational histories and communication styles. The n=1 probe has done its job — it identified the questions and the failure modes. Replication across subjects is the natural next step and the obvious limitation to retire.

How will this funding be used?

Funding goal: $15,000. Minimum to proceed: $5,000. A partial raise still ships something concrete (see below).

Protected research time — $8,000. A stipend covering ~6 months of focused evening/weekend hours to run the multi-subject study and package the open-source release. This is the binding constraint — the work is currently squeezed around a full-time job and teaching.
Compute & cloud services — $3,000. GPU time for model training across versions, multi-subject inference hosting, and the cloud tools and APIs the system depends on to run continuously.
Server hardware — $2,000. Dedicated capacity for continuous multi-subject deployment beyond current personal hardware.
Multi-subject infrastructure — $1,500. Consent/privacy tooling, deployment infra, and modest participant compensation.
Reserve (voice / misc.) — $500. STT + TTS for the interactive-voice research line and unforeseen costs.

Total — $15,000.

What the $5,000 minimum buys on its own: the open-source release of the evaluation tooling and the architecture’s truthfulness mitigations (confabulation handling, retrieval-contamination defense), plus the compute and cloud costs to validate them — a usable, shippable artifact even if the full multi-subject expansion isn’t funded. Every dollar above the minimum extends the work toward replication across subjects.

This is not a project that needs a large budget to make progress. It runs on modest resources, and the limiting factor has been time, not equipment — which is exactly what this grant is meant to unlock.

Who is on your team? What's your track record on similar projects?

Solo independent researcher. Senior .NET engineer, adjunct CS instructor, principal of Learned Geek Consulting.
One published research preprint (DOI 10.5281/zenodo.19342190); second paper at ~85%; a multi-paper research program documented.
~10 months of continuous deployment with quantitative instrumentation — not a prototype or a Wizard-of-Oz study.
The empirical contrast above (ANI’s emergent display rules vs. the sycophantic-mirroring pattern in the published companion literature) is the centerpiece finding, and it is directly relevant to current socioaffective-alignment work.

What are the most likely causes and outcomes if this project fails?

The n=1 findings may not replicate. Emergent display rules and the desire/restraint dynamics were observed in one relationship; across subjects they may attenuate or vary. (This is precisely why expansion is the goal — but it’s a real risk to the headline result.)
Multi-subject recruitment for an intimate-feeling system raises genuine consent and privacy complexity that has to be handled carefully and slowly.
Researcher time is the bottleneck, not architecture — without runway, progress stays part-time and slow.

How much money have you raised in the last 12 months, and from where?

Self-funded to date; nothing raised. The project runs on personal hardware and cloud services.