Manifund

Hello!

I am an AI alignment researcher working on mechanistic interpretability, and sometimes on "interp informed control". I was a MARS 3.0 fellow, where I worked with Goedesic Research on architectural modifications for transformers to incentivise externalisation of reasoning (https://arxiv.org/abs/2603.21376). I am currently a MATS 9 extension fellow at Simplex, where I work on understanding effect of post training on the internal mechanisms.

Before getting into alignment research, I completed my PhD at the University of Amsterdam where I worked on geometric methods to interpretability (https://proceedings.mlr.press/v267/gardinazzi25a.html, https://arxiv.org/pdf/2501.10573). When I was still typing out my code, I was an astrophysicist understanding what we can learn about the universe by looking at the shape of the galaxy distributions.

Projects

Transactions