Grant to establish an AI safety lab and fellowship from India

Redarc Labs

Lab Motivation:

We are an AI safety Lab focused on interpretability under adversarial conditions. Red in our name stands for all the misalignments, jailbreaking and other dangerous behavior. Arc represent that we don’t just want to judge through the outputs but look at the whole journey , the whole arc through interpretability.

We are interested more in the mechanisms than just output behaviors and then to create ai control and monitoring protocols on top of those. To prevent or control the above mentioned undesirable behaviors.

India is one country where AI adoption is rapidly growing and may surpass the global usage, furthering the need of initiatives such as ours to be present and work here and there are no similar major organizations most being based in the UK and the US.

Our Work:

Our current work spans Biosecurity, AI jailbreaks and emotion weight monitoring protocol. We are also simultaneously working on discovering new attack surfaces.

Some things that we have published are:

Loss Landscape Response to Adversarial Perturbation Is Architecture-Dependent

Conference Paper,Adversarial Robustness,TAIS

Toxin Feature Hierarchy in ESM-2

Workshop Paper,Protein LM,ICML GenBio

Fourier Gradient Regularisation for Adversarial Robustness

Workshop Poster,Adversarial Robustness,NeurIPS Reliable ML from Unreliable Data

More research we are exploring:

Community:

We are also trying to build a community for AI Safety around us, this includes giving talks in colleges around Delhi and maintaining and active Cohort where we take regular lectures and reading sessions around fundamentals of AI and AI Safety,

https://www.linkedin.com/company/redarc-labs/

Our goals:

Fellowships have played a meaningful role in our journey. Most of the opportunities we’ve seen are based in London or Berkeley, and they're often difficult for students and early-career researchers from India to access.

One of our goals is to help bridge that gap by creating opportunities and mentorship for people who want to contribute to AI safety from here.

Our goal for the next 6 months is to discover more attack surfaces and adversarial settings and start on a tool for multi-agent adversarial setting, we also plan on publishing 5-6 novel research artifacts to be developed and explored further.

It might sound ambitious and builds from India and goes toe to toe with major AI Safety orgs like Redwood, Goodfire Grayswan, METR.

Our Team:

We are two dedicated researchers:

Shivam Dubey

Apart Research fellow, under Jason Hoelscher-Obermaier.
MARS V Research Fellow, Cambridge AI Safety Hub.
Lead on FASD project (77% bias reduction), cited by MIT Technology Review.
GitHub: github.com/punctualprocrastinator · LinkedIn: linkedin.com/in/syntaxsavant · shivam@redarclabs.com

Manan Wadhwa

MARS V Research Fellow, Cambridge AI Safety Hub
Google Summer of Code 2026, HumanAI organisation.
Research Fellow, AISI @ Georgia Tech.
GitHub: github.com/Manan-Wadhwa · LinkedIn: linkedin.com/in/manan-wadhwa · manan@redarclabs.com

We are currently doing internships side by side to build credibility and to self fund some parts of the research and conducting sessions and workshops.

We also have small cohort of 6 fellows learning with us.

Current status:

We continue to work on the above ideas and plan to make routine iterations to those.

We have been promised grant of $800 on manifund:

https://manifund.org/projects/emotion-as-attack-surface-and-monitoring-signal-in-thinking-models

https://manifund.org/projects/finding-and-defending-the-attention-circuits-that-make-llms-jailbreakable

Another of one our projects on manifund:

https://manifund.org/projects/attack-defense-and-mechanistic-taxonomy-of-protein-language-model-biosecurity

We work remotely, but are based in Delhi.

Grant to establish an AI safety lab and fellowship from India

Offer to donate