7

Activation vector steering with BCI

ActiveGrant
$30,260raised
$244,000funding goal

Project summary

Recent work (https://tinyurl.com/avgpt2xl) has shown that language models can be ā€œsteeredā€ (towards text completions which resemble humans in differing mental states) by simply adding vectors to the modelā€™s neural activations. Other recent work (e.g. https://tinyurl.com/latentlin) has shown that latent representations of different models can be bridged by a simple linear mapping. In this experiment our hypothesis is that (some aspects of) human brain states can be bridged to the latent representations of language models by simple mappings. This could contribute to prosaic AI alignment: (1) generative models could be steered to exhibit the specific brain states of specific people, to better represent their attitudes and opinions; (2) reward models could be trained to reproduce humanlike brain states during evaluation, making them more generalizable out-of-distribution; (3) scientific understanding of analogies between LLM behavior patterns and human behavior patterns could be improved.

What are this project's goals and how they be achieved?

Some of the specific steps:

  • Design the fMRI data-collection protocol

  • Implement the data-collection protocol (in particular, the display and keyboard elements)

  • Recruit human subjects

  • Connect with a suitable fMRI center and get the experiment approved (IRB process)

  • Administer the human-subject data-collection

  • Design the ML experiments (fMRI feature extraction pipeline, particular architecture modifications, loss function, validation metrics)

  • Implement the ML experiments (the dataset may be large enough to require cloud resources)

  • Write the technical report/paper

Impact:

  • Advancing the science of direct and meaningful connections between human minds and prosaic AI

  • Which is one potential pathway toward more generalizable AI value alignmentā€”by ultimately modeling the process by which humans make value judgments more causally and mecahnistically, as opposed to merely its behavioral statistical features on a finite training distribution

How will this funding be used?

Salary

  • 108000$ 6 months salary for 1 researchers + 3 months 1 ML engineer (16k/month 3 months for ML, 10k/month 6 months for 1 researcher)

    • This will include one researcher + one ML engineer

  • 900$  fMRI ops contractor (30h * 30$/h)

  • 900$ Participant Volunteer compensation (25 Participants 1h 30$/h)

  • 50000$ tax for the salaries (assumed ~45% total overhead regardless of specific tax optimizations)

Equipment

  • 4800$ compute costs ( A100 GPU * 6 months)

  • 16500$ = 25h of fMRI time at($660 per hour ). We think weā€™d need 20-25h at the lower bound, and the more hours we can get the better. 

  • 50$  rubber-based ā€œVirtually Indestructible Keyboardā€ for MRI-compatibility, only available used

  • 2000$ MRI-compatible screens for use inside the machine and/or travel to an fMRI facility with this installation already available

  • 3000$ Research laptop for use onsite at recordings

One-off Misc

  • 15600$ Office Costs (1400$/person office cost at FAR labs monthly 6 months 2 persons)

  • 1776$ Proportional visa costs for 1 researcher for this time period

20% buffer

Total: $244k

Who is on the team and what's their track record on similar projects?

David ā€œdavidadā€ Dalrymple:


Lisa Thiergart: 

What are the most likely causes and outcomes if this project fails? (premortem)

The most obvious is that AIs don't make value judgements like humans do and this is a waste of time. It still seems well worth trying though.

What other funding is this person or project getting?

Probably some from Foresight since they are applying and we are in discussions with them. They donā€™t want to very actively spend time seeking grants since it is very time-consuming.

donated $110
Adrian-Regenfuss avatar

Adrian Regenfuss

7 months ago

I've referenced this proposal a bunch of times in conversation, and find it pretty cool.

donated $110
Adrian-Regenfuss avatar

Adrian Regenfuss

7 months ago

@Adrian-Regenfuss I would be even more enthusiastic if there were plans to also train LLMs on human brain signals, since the activations look to me to be too inflexible to bridge potentially-extremely alien cognition to human cognition. But that's a much higher ask.

spunge avatar

Sophia Pung

10 months ago

Hey Lisa and David!

Iā€™m reaching out regarding a project that you might be interested in.

Previously I applied on Manifold for a grant for programming a phone to track GPS on a solar powered TuckTuck with Solar4Africa, and over the past three months, Iā€™ve been working with a team to develop a novel Electroencephalogram.

Our team is called Monolith BCI (on Twitter), and we are creating a novel PCB design and ML model to process electrical signals generated by large clumps of neurons in the brain with an EEG. Henry is working on the ML model (Bramble), as well as a bunch of parts of the project, Cheru and JC designed the PCB with 0 PCB experience before this project. Iā€™m creating a research paper, and dataset (FLUX- the Framework for Learning and Understanding Cortex Activity) to help fine-tune our LLM model.

Weā€™ll be in SF for our 3rd sprint from Feb 24th-March 4th and would love to chat. We have a working v2.0 of our PCB (Trillium) which has 8 channels. By the end of our 3rd sprint goal is to have our own PCB working with our ML model to play Tetris and Pong without using EMG signals (weā€™ve already successfully used four modalities- jaw clench, blinking, focus, and non-focus), but all of these rely on EMG signals, and our product will only be viable once weā€™re able to capture thoughts alone- but we believe weā€™re really close to getting there.

Let us know if youā€™d like to chat!

-Monolith MMI (mind-machine interfaces)

donated $15,000
MarcusAbramovitch avatar

Marcus Abramovitch

11 months ago

I reached out to Lisa for a progress update on this:

-There really is an important minimum funding to make this project viable that hasn't been achieved yet and so they haven't started. Money is sitting in Lisa's account.

-She will talk with Davidad soon and decide if they are going to make a push for more funding, pivot to something else that could be lower cost or return the funds (expecting by end of March)

I like the honesty here and hope we can get Lisa funded to do a large project (whether this one of another one on areas she is interested in). I'm very "bullish" on Lisa still. I think she's someone that can/will just make something happen that should happen without needing too much permission. I think she also has a rather unique and needed blend of technical understanding + ability to do all the little organizing things that are needed for something to get off the ground.

donated $15,000
MarcusAbramovitch avatar

Marcus Abramovitch

over 1 year ago

Main points in favor of this grant

When I talked with Lisa, she was clearly able to articulate why the project is a good idea. Often people struggle to do this

Lisa is smart and talented and wants to be expanding her impact by leading projects. This seems well worth supporting.

Davidad is fairly well-known to be very insightful and proposed the project before seeing the original results.

Reviewers from Nonlinear Network gave great feedback on funding Lisa for two projects she proposed. She was most excited about this one and, with rare exceptions, when a person has two potential projects, they should do the one they are most excited about.

I think we need to get more tools in our arsenal for attacking alignment. With some early promising results, it seems very good to build out activation vector steering.

Donor's main reservations

I donā€™t feel Iā€™m a good judge of whether or not this is worth doing. I think I judge talent well, but I don't have nearly enough alignment background or neurotech background to judge this. This is far more of a bet on the people than on the project. I also don't think many people would be qualified to judge the project.

It's expensive.

I somewhat worry that Lisa won't be full-time on the project and/or that this might distract her from her other work. She did say she had broad support from her current workplace to pursue this in tandem.

Process for deciding amount

The project is in discussion with Foresight to see if it's possible to do a scaled-down version that isn't as expensive. My $15k should go towards getting the ball rolling with the expectation of a few more people to get this at least to the scaled-down stage but preferably the full proposed project.

Conflicts of interest

None


donated $15,000
MarcusAbramovitch avatar

Marcus Abramovitch

over 1 year ago

I interviewed Lisa for this grant.

Reasons I am excited about Lisa:
-She is quite articulate and has good people/social skills and is able to simply explain concepts.
-She is already doing some management and wants to be expanding here. Worth supporting since it seems to me that there is a lack of management experience in the AI safety research space.
-She's quite smart and value-aligned.