Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
faruna avatarfaruna avatar
Godwin Abuh Faruna

@faruna

Multilingual AI safety researcher working on mechanistic refusal steering for low‑resource African languages and Arabic, using activation steering, SAEs, and circuit analysis.

https://www.faruna.space/
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

About Me

I’m a software‑engineer‑turned AI safety researcher in Abuja, studying why frontier models lose their safety guardrails in low‑resource languages and how to recover them without retraining. My ICML GlobalSouthML paper introduces Latent Space Refusal Anchoring (LSR‑Anchoring), a training‑free steering method that recovers refusals in Yoruba, Igbo, Igala, Hausa, and Swahili using only English activations, with <0.35pp MMLU drop.

I work mostly with Llama‑3, Mistral, Qwen, and Gemma, using activation steering, EleutherAI SAEs, ACDC‑style circuit discovery, and adversarial evaluations (e.g. GCG) to understand and repair multilingual safety failures. I’ve identified language‑agnostic refusal features and shown that a single SAE feature can eliminate benign collapse on Llama‑3‑8B while matching safety recovery at much lower KL.

Outside of experiments, I lead technical work for the Nigerian AI Coalition’s safety roadmap, built the LSR Workbench for cross‑lingual red‑teaming and visualization, and contribute multilingual safety benchmarks to Inspect. I’m especially interested in projects that sit at the intersection of mechanistic interpretability, Global South deployment realities, and cross‑lingual alignment.