Manifund

I am a Cognitive Neuroscientist transitioning into AI safety, with one year of hands-on research in interpretability and model behavior analysis. My work focuses on understanding how internal representations in LLMs shape values, decision-making, and actions, analogous to human cognition.

Recent work includes:

Mechanistic interpretability (SAE, TransformerLens) to visualize latent concepts and reasoning structures
Analysis of overconfidence and unfaithful Chain-of-Thought reasoning in LLMs
LLM persona elicitation and characterization of latent behaviors (e.g. Sydney-style behaviors)
AI security and red-teaming challenges, including testing frontier proprietary models

My long-term goal is to develop interpretable, transparent, reliable and scalable AI models.

See further work in my personal webiste ( https://hiki-t.github.io/ )