You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
The gap I'm filling
We have two tools for detecting when LLMs mislead or deceive:
- Output evaluation — checks what the model says. Misses subtle inconsistencies.
- White-box probing — inspects internal activations. Requires weight access. Unavailable for most deployed models.
Nothing operates systematically between these two levels.
I noticed this gap by watching LLMs reason before they respond. The thinking process and the output sometimes don't cohere — the model arrives one way internally and presents another externally. Existing tools don't catch this.
Quark-AI extracts behavioral "fossils" from LLM interactions — structured profiles capturing how a model reasons across sessions, not just what it outputs on a single turn.
Current state:
- 785 fossils serialized across multiple entity types
- Working analysis pipeline (embedding, clustering, trajectory visualization)
- Early divergence signals: entities under adversarial pressure show measurable fidelity drops (87% vs 65% coherence to behavioral profile under identical conditions)
- Live system at quark-ai.cordee.ovh
WHY THIS MATTERS NOW?
Models are being deployed into high-stakes environments — medical, legal, agentic — where output monitoring is insufficient and weight access doesn't exist. The window to build monitoring infrastructure before capability jumps make this harder is narrowing.
A validated gray-box behavioral monitoring layer would be immediately applicable to deployed frontier models. That's the bet this project is making.
$7,000 (3-4 months): Stipend $4,500 · Compute $1,200 · API $700 · Infrastructure $600
$20,000 (6 months): Stipend $13,000 · Compute $4,000 · API $1,500 · Infrastructure $1,500
With $7K — Minimum viable:
Public documentation of the behavioral extraction protocol, complete and annotated JSON scheme, dataset of 100 fossils with basic metrics (fidelity, plasticity, tension). Deliverable: a reproducible package allowing other researchers to implement the method independently.
With $20K — Full Benchmark:
All the above, plus: multi-model experiments on structured deception datasets, comparisons against baselines (output-level + supervised probing), cross-validation, dataset 700+ annotated fossils. Deliverable: a publicly usable benchmark with statistical results.
Independent AI researcher, Querétaro Mexico. I build systems and observe what happens.
- Self-funded until now
- 2 years building production AI infrastructure (Quark-AI + Cordée)
- No academic affiliation — the fossils and the working system are the track record
Happy to share JSON schema, sample fossils, and demo access to any regrantor who wants to look under the hood.
The most likely causes of failure:
1. Weak signal — behavioral fingerprint divergence may not correlate strongly enough with deceptive outputs to be useful as a detection signal. This is the central empirical risk and the honest reason this research needs to be done rather than assumed.
2. Solo researcher bottleneck — without a team, illness, financial pressure, or competing priorities could slow or halt the work mid-project.
3. Generalization failure — patterns observed in my controlled environment may not transfer to other models or deployment contexts, limiting the benchmark's practical value.
Outcomes if it fails:
If the signal is weak or absent, the benchmark still has value — it documents a negative result rigorously, which saves other researchers from pursuing the same direction without evidence. A well-documented failure is a contribution.
If execution fails mid-project, the existing dataset (785 fossils, working pipeline) remains publicly available and reusable.
Self-funded until now
2 years building production AI infrastructure (Quark-AI + Cordée)
Counterfactual case
With $7K minimum: the methodology package exists and is public. Without it: this specific validation sprint does not happen on this timeline.
There are no bids on this project.