🍉
Cadenza Labs

@cadenza_labs

We are a new AI Safety Org, focusing on Conceptual Interpretability

https://cadenzalabs.org/
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

About Me

The goal of our group is to do research which contributes to solving AI alignment. Broadly, we of course aim to work on whatever technical alignment projects have the highest expected value. Our current best ideas for research directions to pursue are in interpretability. More about our research agenda can be found here.

Projects

Comments

🍉

Cadenza Labs

5 months ago

Final report

Description of subprojects and results, including major changes from the original proposal

  1. We mostly worked on a paper studying ways to improve unsupervised probing methods via clustering. Our paper got accepted to the MechInterp workshop at ICML. We have also submitted the paper to a top conference, and it is under review now.

  2. SPAR mentor in 2024 spring iteration. We have been working on a project where we were using probing methods to elicit the value of state inside policy and value networks in reinforcement learning. Furthermore, three SPAR students also worked on the above paper mentioned in 1.

  3. In addition, our lead researcher has been involved in multiple projects which also got accepted to the MechInterp workshop here and here

Spending breakdown

Since we got only a relatively small part, we were only able to cover costs during for our stay at FAR Labs in Berkeley for two people (flights, accommodation, food, etc.). They invited us for their team-in-residency program, where we worked mostly on the above things. Some costs were also used for compute, and the ICML conference.

Transactions

ForDateTypeAmount
Manifund Bank5 months agowithdraw1
Manifund Bank5 months agowithdraw7809
Cadenza Labs: AI Safety research group working on own interpretability agenda7 months agoproject donation+100
Cadenza Labs: AI Safety research group working on own interpretability agenda12 months agoproject donation+5000
Cadenza Labs: AI Safety research group working on own interpretability agendaabout 1 year agoproject donation+100
Cadenza Labs: AI Safety research group working on own interpretability agendaabout 1 year agoproject donation+10
Cadenza Labs: AI Safety research group working on own interpretability agendaabout 1 year agoproject donation+100
Cadenza Labs: AI Safety research group working on own interpretability agendaabout 1 year agoproject donation+790
Cadenza Labs: AI Safety research group working on own interpretability agendaabout 1 year agoproject donation+1000
Cadenza Labs: AI Safety research group working on own interpretability agendaabout 1 year agoproject donation+210
Cadenza Labs: AI Safety research group working on own interpretability agendaabout 1 year agoproject donation+500