Manifund

I’m a software‑engineer‑turned AI safety researcher in Abuja, studying why frontier models lose their safety guardrails in low‑resource languages and how to recover them without retraining. My ICML GlobalSouthML paper introduces Latent Space Refusal Anchoring (LSR‑Anchoring), a training‑free steering method that recovers refusals in Yoruba, Igbo, Igala, Hausa, and Swahili using only English activations, with <0.35pp MMLU drop.

I work mostly with Llama‑3, Mistral, Qwen, and Gemma, using activation steering, EleutherAI SAEs, ACDC‑style circuit discovery, and adversarial evaluations (e.g. GCG) to understand and repair multilingual safety failures. I’ve identified language‑agnostic refusal features and shown that a single SAE feature can eliminate benign collapse on Llama‑3‑8B while matching safety recovery at much lower KL.

Outside of experiments, I lead technical work for the Nigerian AI Coalition’s safety roadmap, built the LSR Workbench for cross‑lingual red‑teaming and visualization, and contribute multilingual safety benchmarks to Inspect. I’m especially interested in projects that sit at the intersection of mechanistic interpretability, Global South deployment realities, and cross‑lingual alignment.

Projects