Manifund

I came to AI safety through intrinsic curiosity, not a formal CS or ML pipeline. Over the past six months I taught myself machine learning, reinforcement learning, and code analysis — then built ast-guard, a zero-dependency AST analyzer that detects structural reward hacking in LLM-generated code. I integrated it into a GRPO training loop and ran three A100 experiments that produced what I believe is the first empirical observation of gradual strategy migration in a reward-hacking model under deterministic selection pressure.

I have no prior publications and no institutional affiliation. This project is entirely solo and self-funded. I'm looking for mentors, collaborators, and a co-author to help turn the existing empirical results into a formal paper. I'm especially interested in connecting with people working on reward hacking detection, RL training interventions, or CoT monitoring.

Projects