Behavioral Fingerprinting for LLM Deception Detection

Project summary

The gap I'm filling
We have two tools for detecting when LLMs mislead or deceive:

- Output evaluation — checks what the model says. Misses subtle inconsistencies.

- White-box probing — inspects internal activations. Requires weight access. Unavailable for most deployed models.

Nothing operates systematically between these two levels.

I noticed this gap by watching LLMs reason before they respond. The thinking process and the output sometimes don't cohere — the model arrives one way internally and presents another externally. Existing tools don't catch this.

What are this project's goals? How will you achieve them?

Quark-AI extracts behavioral "fossils" from LLM interactions — structured profiles capturing how a model reasons across sessions, not just what it outputs on a single turn.

Current state:

- 785 fossils serialized across multiple entity types

- Working analysis pipeline (embedding, clustering, trajectory visualization)

- Early divergence signals: entities under adversarial pressure show measurable fidelity drops (87% vs 65% coherence to behavioral profile under identical conditions)

- Live system at quark-ai.cordee.ovh

WHY THIS MATTERS NOW?

Models are being deployed into high-stakes environments — medical, legal, agentic — where output monitoring is insufficient and weight access doesn't exist. The window to build monitoring infrastructure before capability jumps make this harder is narrowing.

A validated gray-box behavioral monitoring layer would be immediately applicable to deployed frontier models. That's the bet this project is making.

How will this funding be used?

$7,000 (3-4 months): Stipend $4,500 · Compute $1,200 · API $700 · Infrastructure $600

$20,000 (6 months): Stipend $13,000 · Compute $4,000 · API $1,500 · Infrastructure $1,500

With $7K — Minimum viable:

Public documentation of the behavioral extraction protocol, complete and annotated JSON scheme, dataset of 100 fossils with basic metrics (fidelity, plasticity, tension). Deliverable: a reproducible package allowing other researchers to implement the method independently.

With $20K — Full Benchmark:

All the above, plus: multi-model experiments on structured deception datasets, comparisons against baselines (output-level + supervised probing), cross-validation, dataset 700+ annotated fossils. Deliverable: a publicly usable benchmark with statistical results.

Who is on your team? What's your track record on similar projects?

Independent AI researcher, Querétaro Mexico. I build systems and observe what happens.

- Self-funded until now

- 2 years building production AI infrastructure (Quark-AI + Cordée)

- No academic affiliation — the fossils and the working system are the track record

Happy to share JSON schema, sample fossils, and demo access to any regrantor who wants to look under the hood.

nicolasguen1@gmail.com

https://quark-ai.cordee.ovh

What are the most likely causes and outcomes if this project fails?

The most likely causes of failure:

1. Weak signal — behavioral fingerprint divergence may not correlate strongly enough with deceptive outputs to be useful as a detection signal. This is the central empirical risk and the honest reason this research needs to be done rather than assumed.

2. Solo researcher bottleneck — without a team, illness, financial pressure, or competing priorities could slow or halt the work mid-project.

3. Generalization failure — patterns observed in my controlled environment may not transfer to other models or deployment contexts, limiting the benchmark's practical value.

Outcomes if it fails:

If the signal is weak or absent, the benchmark still has value — it documents a negative result rigorously, which saves other researchers from pursuing the same direction without evidence. A well-documented failure is a contribution.

If execution fails mid-project, the existing dataset (785 fossils, working pipeline) remains publicly available and reusable.

How much money have you raised in the last 12 months, and from where?

Self-funded until now

2 years building production AI infrastructure (Quark-AI + Cordée)

Counterfactual case

With $7K minimum: the methodology package exists and is public. Without it: this specific validation sprint does not happen on this timeline.