You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
I am requesting funding for 6 months of independent mechanistic interpretability research investigating the temporal causal flow of deception in LLMs specifically, at what point in the autoregressive forward pass does a deceptive output become committed, and whether that commitment can be detected and intercepted before the output token is produced. The research will be conducted in Kigali, Rwanda.
Month 1: completing the ARENA curriculum. Months 2–6: original research using token-level activation patching and causal tracing to map the temporal dynamics of deception across the generation sequence, with the goal of building a pre-output deception interceptor that operates inside the forward pass rather than post-hoc on outputs.
Existing mech interp deception detectors operate at the layer level identifying which layers encode deception at a fixed evaluation point. No work has mapped the temporal dynamics across token positions during autoregressive generation: at which generated token does the commitment to deceive crystallize? Can you identify that moment causally and intervene before the deceptive output is produced? This is a fundamentally different question from layer-level localization.
This has direct safety implications. A pre-output interceptor would be fundamentally more robust than post-hoc monitoring it could halt or steer generation before the deception occurs, rather than flagging it afterward.
I will achieve this by:
Completing ARENA in month 1 (TransformerLens, circuits, activation patching, causal tracing)
Constructing controlled deception datasets with clear ground-truth on when the model "knows" it will deceive
Using token-position-level activation patching to trace when deception-related representations emerge across the generation sequence
Testing whether interventions at the identified commitment point can suppress deceptive output before it is produced
Producing a paper or Alignment Forum post with the temporal circuit maps and interceptor results
The secondary goal is to unlock access to mech interp fellowships MATS, Anthropic residency, Neel Nanda's team all of which require prior independent research. This grant is the only realistic path.
Full budget breakdown: [Sheet]
Summary:
Monthly living × 6 months (rent, food, internet, transport, stipend): $7,200
One-time setup (flight Port Sudan→Kigali, laptop, visa, household basics): $2,180
Cloud compute (net after $500 Modal GPU credit already secured): $1500
Total: $10,880
The laptop is a hard requirement all mech interp work requires running TransformerLens and PyTorch locally, and I currently have no working device. I access the internet at cafes in areas partially controlled by RSF militia in Sudan.
This is a solo project. My relevant background:
Master's in AI for Science, AIMS South Africa / UCT - fully funded by Google DeepMind (2% acceptance rate). Thesis: Context-Aware Neural Network for ARC-AGI benchmark, supervised by Prof. Ulrich Paquet (Google DeepMind).
Research Engineer, Sultan Qaboos University (Oman) - built NLP analytics platform with multilingual embeddings and semantic similarity metrics for research funding evaluation.
1st place, Build with AI Hackathon, Kigali 2025
1st place, Qeen.AI Data Science Challenge, Qatar 2025
3rd place, InstaDeep Hackathon, Deep Learning Indaba, Kigali 2025
Undergraduate thesis: adversarial attacks on neural networks (FGSM & one-pixel attacks on MNIST/CIFAR-10)
GitHub: https://github.com/AhMedDa1
The most likely failure mode is that the research does not reach publication quality within the grant period. In that case, I would still have produced documented experiments, negative results, and a public write-up on the Alignment Forum all of which have value for the field and are accepted as evidence of research experience by fellowship programs.
A secondary risk is that living costs in Kigali exceed estimates. I have budgeted conservatively and have a lean personal stipend ($550/month) that leaves little buffer, but the main cost categories (rent, food, internet) are well-established in the market.
The research failing to happen at all due to lack of funding is the worst outcome, and the most likely one if this grant is not awarded. There is no alternative funding path available to me.
$0. I have had no stable income or funding since completing my master's degree in September 2025. I am currently displaced in Sudan due to the ongoing armed conflict between the Sudanese Armed Forces and the RSF.