You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
A structural, falsifiable account of how a system holds a coherent observer — and of what coherence, versus mutual distortion, between a human and an AI sharing a cognitive loop actually consists of. Extended from a published human-system framework to the human–AI case.
The project develops a structural, measurable account of two linked things: (1) what stabilizes a coherent observer inside a system, and (2) what coherence — versus mutual distortion — between a human system and an AI system sharing a cognitive loop actually consists of.
The human-side foundation already exists. Over the past year I have published a framework, HSA (Human System Architecture), that treats the human as a predictive system in which the observer is not an entity but a measurable mode: a state of temporal coherence across operational parameters, governed by an asymmetry principle — reconfiguring a system need not move the observer-mode, but moving the mode reorganizes its configurations wholesale. This is set out in distributed working papers and a preprint on observer stabilization, and it already yields falsifiable predictions (e.g., the order-dependence of psychological change).
This grant funds the frontier: carrying that account rigorously into AI systems and, above all, into the coupled human–AI system. As AI enters the human cognitive loop, the quantity that matters for safety is the coherence of the joint system — whether the human's and the model's models of reality stabilize one another or drift together, away from reality. I will:
operationalize the observer-mode metric for LLMs (temporal-coherence invariants across context shifts);
operationalize a measure of human–AI coherence/distortion in a shared loop, including how authority- and primacy-weighting govern joint drift;
test the framework's sharpest prediction: interventions that leave a system's integrating axis intact are reabsorbed, while interventions that change the axis reorganize behaviour wholesale.
Known model failure modes — sycophancy, fabrication, loss of a stable line under pressure — fall out of this as concrete, testable entailments, not as the object of study.
6-month focused effort:
Researcher time (6 months): $15,000
Model/API compute for the coherence and asymmetry probes: $6,000
English editing of publishable outputs: $2,000
Tools / buffer: $2,000
Total: $25,000. Minimum viable first tranche ($10,000) funds the metric operationalization plus a first probe.
Solo independent researcher and author (Nika Novak), based in Brazil. In ~one year of publishing — after years of private work — I have released: distributed working papers on SSRN bridging the architecture of subjectivity and attention and intelligent machines (HSA; ICAM; Attention as a Physical Operator); a recent paper built around a falsifiable test of intervention order-dependence; a Zenodo preprint, Structural Thresholds of Observer Stabilization; and two books on attention. I work with explicit epistemic discipline: I treat a compelling, internally coherent story as a warning sign rather than evidence, and I mark the predictions that can fail and the edges where the framework stops. Links: [SSRN author page] · [Zenodo] · [Amazon author page]
The most likely failure is that the observer-mode metric does not cohere into a stable measurable on current models — the systems are too fragmentary for the invariant to hold. That is itself an informative negative result and would be reported as such. A second risk is that the framework stays legible only to me; the deliverable (open methodology + minimal eval code + a paper, negative results included) is built specifically to make it testable and reusable by the interpretability/evals community, not to ask anyone to adopt HSA on faith.
A measurement/evaluation contribution about coherence and observer-stability sits far closer to safety than to capabilities: it produces metrics and tests, not stronger models. The residual concern is that any account of what stabilizes a model's "line" could in principle inform making models more persuasive; I keep the work on the measurement/diagnosis side and publish in safety venues.