You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
Redarc Labs
Lab Motivation:
We are an AI safety Lab focused on interpretability under adversarial conditions. Red in our name stands for all the misalignments, jailbreaking and other dangerous behavior. Arc represent that we don’t just want to judge through the outputs but look at the whole journey , the whole arc through interpretability.
We are interested more in the mechanisms than just output behaviors and then to create ai control and monitoring protocols on top of those. To prevent or control the above mentioned undesirable behaviors.
India is one country where AI adoption is rapidly growing and may surpass the global usage, furthering the need of initiatives such as ours to be present and work here and there are no similar major organizations most being based in the UK and the US.
Our Work:
Our current work spans Biosecurity, AI jailbreaks and emotion weight monitoring protocol. We are also simultaneously working on discovering new attack surfaces.
Some things that we have published are:
Loss Landscape Response to Adversarial Perturbation Is Architecture-Dependent
Conference Paper,Adversarial Robustness,TAIS
Toxin Feature Hierarchy in ESM-2
Workshop Paper,Protein LM,ICML GenBio
Fourier Gradient Regularisation for Adversarial Robustness
Workshop Poster,Adversarial Robustness,NeurIPS Reliable ML from Unreliable Data
More research we are exploring:
Thinking Model Emotions: Pre-Commitment and Functional Frustration in Extended Thinking
The Geometry of Control: Spectral Attractors as Low-Dimensional Projections in Large Language Models
Community:
We are also trying to build a community for AI Safety around us, this includes giving talks in colleges around Delhi and maintaining and active Cohort where we take regular lectures and reading sessions around fundamentals of AI and AI Safety,
https://www.linkedin.com/company/redarc-labs/
Our goals:
Fellowships have played a meaningful role in our journey. Most of the opportunities we’ve seen are based in London or Berkeley, and they're often difficult for students and early-career researchers from India to access.
One of our goals is to help bridge that gap by creating opportunities and mentorship for people who want to contribute to AI safety from here.
Our goal for the next 6 months is to discover more attack surfaces and adversarial settings and start on a tool for multi-agent adversarial setting, we also plan on publishing 5-6 novel research artifacts to be developed and explored further.
It might sound ambitious and builds from India and goes toe to toe with major AI Safety orgs like Redwood, Goodfire Grayswan, METR.
Our Team:
We are two dedicated researchers:
Shivam Dubey
Apart Research fellow, under Jason Hoelscher-Obermaier.
MARS V Research Fellow, Cambridge AI Safety Hub.
Lead on FASD project (77% bias reduction), cited by MIT Technology Review.
GitHub: github.com/punctualprocrastinator · LinkedIn: linkedin.com/in/syntaxsavant · shivam@redarclabs.com
Manan Wadhwa
MARS V Research Fellow, Cambridge AI Safety Hub
Google Summer of Code 2026, HumanAI organisation.
Research Fellow, AISI @ Georgia Tech.
GitHub: github.com/Manan-Wadhwa · LinkedIn: linkedin.com/in/manan-wadhwa · manan@redarclabs.com
We are currently doing internships side by side to build credibility and to self fund some parts of the research and conducting sessions and workshops.
We also have small cohort of 6 fellows learning with us.
Current status:
We continue to work on the above ideas and plan to make routine iterations to those.
We have been promised grant of $800 on manifund:
https://manifund.org/projects/emotion-as-attack-surface-and-monitoring-signal-in-thinking-models
Another of one our projects on manifund:
We work remotely, but are based in Delhi.
There are no bids on this project.