Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
sandguine avatarsandguine avatar
Sandy Tanwisuth

@sandguine

Independent Pluralistic Alignment Researcher

https://sandytanwisuth.web.app/
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

About Me

I’m an independent, pluralistic alignment researcher working on a scale-free theory of coalitions of agents. I previously began a PhD at UC Berkeley in Computational Cognitive Science, where I worked on Computational Theory of Mind (~à la epistemology in disguise). I’m now moving toward working on formal epistemology, decision theory, and Bayesian epistemology. Broadly speaking, I study how agents (biological or otherwise) can coordinate efficiently while respecting each individual’s independence. Lately, I’ve been focused on identifying the simplest knowledge requirements that allow coalitions of agents to know how to act or how to acquire more relevant information even when their incentives do not fully align to prevent catastrophic failure of miscoordination as reported in mulit-agent risks.

Projects

Alignment as epistemic system governance under compression

Comments

Ambitious AI Alignment Seminar
sandguine avatar

Sandy Tanwisuth

13 days ago

As someone who has worked with multiple organizations in the AI Safety and AI alignment ecosystems, I think the current development of the field seems to indicate that there should perhaps be a clear separation between the Empirical AI Safety track which is some of the most important and valuable work like BlueDot Impact or MATS and a different pipeline for more exploratory (yet valuable in the long term) that defies the systematic incentives and the status quo such as PIBBSS, ILLIAD, and this fellowship.

In other words, the empirical track is essential. But its very success under current incentives creates a risk: we may optimize so hard for what can be measured now that we starve the exploratory work that will only prove its value later. We need both. This project strikes me as exactly the kind of alternative pipeline that embodies the latter: its design prioritizes deep understanding, community, and researcher well-being. These choices directly counteract the pressures that push some current work toward short-term, measurable results, sometimes at the expense of the deeper thinking the foundational problem ultimately requires.

I personally feel like the mismatched in ecosystems did not allow some participants like me to thrive and perhaps find that if I were to be in this ecosystems instead my productivity and research output wouldn't have been negatively impact by the measurement pressures. I encourage people with more resources to fund this work.

[Retroactive] Funding for developing new "substrate-flexible risk" threat model
sandguine avatar

Sandy Tanwisuth

20 days ago

The MoSSAIC framework is good foundational work that pushes mechanistic interpretability in a more scientifically grounded and principled direction. I recommend funding this work. This paper provides a meta-level intervention while identifies a core assumption underlying most contemporary safety work (the "causal-mechanistic paradigm") and systematically demonstrates why this paradigm will likely fail as AI systems become more capable. It does this not only through abstract speculation alone, but by connecting concrete empirical results (Bailey et al. 2024 on obfuscated activations, McGrath et al. 2023 on self-repair) to a sequence of plausible threat models. I have high respect for Matt Farr for taking the initiative to work on something intrinsically valuable to the Alignment field yet that is misaligned with the incentives of other main players. I believe that this is a diagnostic and constructive work that questions the dominant paradigm that the field is desperately needed yet not being talked about nor discussed enough. The field systematically underfunds this kind of work because it does not produce legible outputs but the MoSSAIC framework is an example of a legible output. I encourage people with more resources to retroactively fund this work.