Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
1

Alignment as epistemic governance under compression

Science & technologyTechnical AI safetyAI governance
sandguine avatar

Sandy Tanwisuth

ProposalGrant
Closes January 24th, 2026
$0raised
$2,000minimum funding
$20,000funding goal

Offer to donate

39 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project summary

This project develops a conceptual, theoretical, and computational account of when a coalition of interacting components is epistemically warranted to act as a single agent and when it is not.

Epistemic Failure Modes in Coalitional Systems

The motivating failure mode is a well-documented multi-agent risk: systems composed of multiple experts, modules, or subagents act on a compressed internal summary that suppresses disagreement, even when that disagreement remains decision-relevant. The central claim of this project is that misaligned AI systems, misaligned institutions, and misaligned markets share a common problem: insufficient shared abstraction. Systems behave like poorly coordinated agents when their components operate with incompatible models of each other’s incentives. In these cases, the system behaves as if it were unified, despite lacking the epistemic warrant to act coherently. This failure mode appears in mixture-of-experts models, modular AI systems, institutional decision-making, and human–AI collectives, and is explicitly highlighted in Multi-Agent Risks from Advanced AI as a pathway to brittle or unsafe outcomes under distributional shift.

Crucially, this pathology is not specific to artificial systems. In Internal Family Systems, psychological harm arises when internal parts are forced into premature unity rather than kept legible and negotiated; in Ostrom's ideal governance, durable coordination depends on institutions that preserve local knowledge and include mechanisms for detecting and repairing breakdowns when assumptions fail. Across these domains, the shared problem is premature compression: collapsing plural perspectives into action without verifying that disagreement is no longer decision-relevant. This project treats coalition-level action as an epistemic achievement that must be earned, not assumed.

The project builds on two complementary foundations. First, Dennis argues that specifications are self-referential epistemic claims: to optimize is to implicitly claim that one’s internal abstractions are adequate to justify action. Second, Lauffer et al. formalize minimal knowledge for coordination via Strategic Equivalence Relations (SER), showing that only distinctions that change an agent’s best response are strategically and epistemically relevant. Together, these works point to a missing layer in current alignment practice: systems lack principled mechanisms for verifying whether abstraction preserves decision-relevant distinctions before acting.

Technically, I model a coalitional agent as a collection of bounded-rational experts that produce divergent Q-value predictions over actions. Action selection is mediated by a soft best response over an aggregation of these predictions, capturing continuous incentive geometry rather than brittle argmax behavior. The central question is: when can internal distinctions be safely collapsed without changing the induced soft best-response distribution, and when must the agent abstain because disagreement remains unresolved?

I formalize this problem using epistemic arbitration as an abstention mechanism for an epistemic relevance abstraction. Internal expert configurations are treated analogously to external co-policies in SER: two configurations are equivalent if they induce the same soft best-response distribution. This yields a KL-based diagnostic, Strategic Equivalence Class Divergence (∆SEC), which measures how much abstraction distorts incentive-relevant behavior. Building on existing theoretical results, small ∆SEC provably upper-bounds value loss via performance-difference arguments and Pinsker-type inequalities. I introduce a second diagnostic capturing instability under alternative expert weightings, separating genuine coalition-level agreement from agreement manufactured by lossy aggregation.

Together, these diagnostics define a dual-gate criterion for epistemically warranted action. The agent may act only when abstraction preserves epistemically relevant distinctions and internal disagreement is sufficiently resolved. Otherwise, the correct behavior is to defer, abstain, or fall back to a safe policy. Abstention is not indecision but a coalition-level action that prevents premature unification. In this sense, the project reframes alignment as epistemic governance under compression, directly aiming to unify a scale-free theory of agency that is, one whose core properties apply across systems ranging from brains and human bodies to coalitions of agents and societies and operationalizing Ngo’s view that coalitional agency is something a system earns through internal coherence rather than something assumed by default.


What are this project’s goals? How will you achieve them?

Goals

1. Formalize coalitional entitlement to act via epistemic relevance.
Define necessary and sufficient conditions under which a coalition of internal experts may commit to an action without risking incentive distortion. This is formalized via soft-best-response–induced epistemic relevance classes and a KL-based abstraction error ∆SEC that upper-bounds value loss. This goal extends Lauffer et al.’s SER framework from external partners to internal coalitions and from hard best responses to soft, bounded-rational ones.

2. Make epistemically relevant disagreement operationally visible.
Develop diagnostics that distinguish genuine coalition-level agreement from agreement manufactured by lossy aggregation. In particular, separate failures of epistemic relevance preservation (large ∆SEC) from failures of plural stability (sensitivity to expert reweighting), which appear independently in practice.

3. Ground abstention as a principled response to epistemic irrelevance.
Design abstention rules that trigger when epistemic relevance conditions for action are violated, rather than treating abstention as an external override. Abstention becomes part of the agent’s decision logic and directly implements Dennis’s insight that acting is itself an epistemic claim.


Approach

I will pursue these goals using a combination of formal analysis, representation learning, and controlled experiments.

Epistemic modeling.
Model internal experts as producing Q-value predictions over actions. Aggregate these predictions through a soft best response, capturing bounded rationality and continuous incentive geometry rather than brittle argmax behavior.

Epistemic relevance diagnostics.
Define epistemic relevance divergence ∆SEC as the KL divergence between soft best responses induced by different internal configurations. Prove that small ∆SEC implies small value loss using a performance-difference lemma and Pinsker-type bounds, extending classical state-abstraction guarantees to coalitional settings.

Dual-gate epistemic arbitration.
Introduce a second diagnostic capturing instability under alternative expert weightings. Action is permitted only when both epistemic relevance preservation (∆SEC) and weighting stability fall below specified tolerances.

Learnable representations.
Implement Strategic InfoNCE, a contrastive objective whose optimal critic recovers a log density ratio between soft best-response actions and a base distribution. This aligns embeddings with epistemically relevant incentive deformation rather than surface behavior and enables empirical estimation of ∆SEC directly from interaction data.

Empirical validation.
Evaluate the framework in pluralist bandits, cooperative Overcooked coordination with heterogeneous partners, and small LLM-agent communication tasks. In each case, test whether abstention gated by the dual-gate epistemic relevance criterion prevents catastrophic miscoordination while preserving overall performance.

This treats coalitional agency as an epistemic achievement that can degrade over time, not a static property.


How will this funding be used?

Funding will primarily support my living expenses, allowing me to dedicate full-time effort to completing and integrating this research program.

Specifically, it will support:

  • finalizing the theoretical components of epistemic arbitration and epistemic relevance abstraction, including consolidation of existing proofs;

  • implementing and refining Strategic InfoNCE and abstention-gated decision filters;

  • running controlled experiments in multi-agent RL and LLM-agent settings;

  • preparing one to two manuscripts for submission to either or both RLC 2026, NeurIPS 2026 and relevant alignment workshops.

The project does not require large-scale compute. Existing toy experiment, RL and LLM evaluation setups supported by CHAI for the Epistemic Relevance Abstraction project and its extensions thereof might be sufficient.


Who is on your team? What is your track record on similar projects?

I, Sandy Tanwisuth, am the sole investigator.

This project builds directly on my prior and ongoing work on abstraction, abstention, and coordination under uncertainty.

Relevant outputs include:

  • [Accepted Manuscript] Uncertainty-Aware Policy-Preserving Abstractions with Abstention, which introduced margin-based abstention in decision-making and was published at the NeurIPS 2nd ARLET workshop 2025.

  • [In Prep, targeting ICML 2026] Preventing Miscoordination in Multi-agentic Systems, an ICML-targeted manuscript currently under preparation developing soft Strategic Equivalence Classes, Strategic InfoNCE, finite-sample guarantees, and value-loss certificates.

  • [In Prep, targeting NeurIPS 2026] Arbitration Paper which extends these ideas to internal pluralism and coalitional action, with theoretical guarantees and empirical validation across RL and LLM-agent domains.

I have experience carrying conceptually driven projects from framing through formalization, implementation, and dissemination, including first-author publications and theory-heavy manuscripts.


What are the most likely causes and outcomes if this project fails?

Likely causes of failure

  • The diagnostic quantities may be overly conservative, triggering abstention too often in realistic settings.

  • Epistemic relevance criteria may not cleanly separate harmful aggregation from benign compression in some domains.

  • The coalitional framing may explain failures without yielding sufficiently predictive signals in highly non-stationary environments.

Outcomes if it fails

Even in failure, the project would yield:

  • negative and boundary results clarifying when coalitional unification is epistemically unjustified;

  • empirical evidence separating epistemic relevance failures from objective misspecification;

  • formal tools for analyzing internal pluralism that can inform future alignment work.

These results would still contribute durable conceptual and technical infrastructure to coalitional alignment research.


How much money have you raised in the last 12 months, and from where?

I have raised $12,000 from MATS 8.0, including direct collaboration with Richard Ngo.

Currently, I am supporting myself through funding from the Cooperative AI Foundation for a related but distinct project titled Epistemic Relevance Abstraction for Multi-Agent Coordination (formerly known as Strategic Relevance Abstraction for Multi-Agent Coordination).

For full disclosure, I applied for the MATS 8.1 extension but was not selected. Based on the limited information available to me, my inference is that the proposal was reviewed by someone whose evaluative priors favor empirical or experimental contributions. At the time, I did not communicate the conceptual framework as clearly or in the terms they were likely expecting. As a result, their recommendations for how to improve the project diverged from both my mentor’s view and my own understanding of what constitutes strong work in this area. Since MATS 8.0, I have substantially revised the manuscript to clarify and foreground its theoretical contributions, and to reorganize the writing around a clearer argumentative throughline. These revisions reflect a shift toward making the framework’s motivation, structure, and claims legible to readers without requiring prior alignment with its perspective.


Minimum funding vs funding goal

Minimum funding would cover basic living expenses and allow continuation progress at a reduced pace.

The full funding goal would support approximately four to six months of focused research time in the Bay Area, enabling completion of the theory, experiments, and manuscripts described above.

CommentsOffersSimilar6

No comments yet. Sign in to create one!