7

Activation vector steering with BCI

ActiveGrant
$30,260raised
$244,000funding goal

Project summary

Recent work (https://tinyurl.com/avgpt2xl) has shown that language models can be ā€œsteeredā€ (towards text completions which resemble humans in differing mental states) by simply adding vectors to the modelā€™s neural activations. Other recent work (e.g. https://tinyurl.com/latentlin) has shown that latent representations of different models can be bridged by a simple linear mapping. In this experiment our hypothesis is that (some aspects of) human brain states can be bridged to the latent representations of language models by simple mappings. This could contribute to prosaic AI alignment: (1) generative models could be steered to exhibit the specific brain states of specific people, to better represent their attitudes and opinions; (2) reward models could be trained to reproduce humanlike brain states during evaluation, making them more generalizable out-of-distribution; (3) scientific understanding of analogies between LLM behavior patterns and human behavior patterns could be improved.

What are this project's goals and how they be achieved?

Some of the specific steps:

  • Design the fMRI data-collection protocol

  • Implement the data-collection protocol (in particular, the display and keyboard elements)

  • Recruit human subjects

  • Connect with a suitable fMRI center and get the experiment approved (IRB process)

  • Administer the human-subject data-collection

  • Design the ML experiments (fMRI feature extraction pipeline, particular architecture modifications, loss function, validation metrics)

  • Implement the ML experiments (the dataset may be large enough to require cloud resources)

  • Write the technical report/paper

Impact:

  • Advancing the science of direct and meaningful connections between human minds and prosaic AI

  • Which is one potential pathway toward more generalizable AI value alignmentā€”by ultimately modeling the process by which humans make value judgments more causally and mecahnistically, as opposed to merely its behavioral statistical features on a finite training distribution

How will this funding be used?

Salary

  • 108000$ 6 months salary for 1 researchers + 3 months 1 ML engineer (16k/month 3 months for ML, 10k/month 6 months for 1 researcher)

    • This will include one researcher + one ML engineer

  • 900$  fMRI ops contractor (30h * 30$/h)

  • 900$ Participant Volunteer compensation (25 Participants 1h 30$/h)

  • 50000$ tax for the salaries (assumed ~45% total overhead regardless of specific tax optimizations)

Equipment

  • 4800$ compute costs ( A100 GPU * 6 months)

  • 16500$ = 25h of fMRI time at($660 per hour ). We think weā€™d need 20-25h at the lower bound, and the more hours we can get the better. 

  • 50$  rubber-based ā€œVirtually Indestructible Keyboardā€ for MRI-compatibility, only available used

  • 2000$ MRI-compatible screens for use inside the machine and/or travel to an fMRI facility with this installation already available

  • 3000$ Research laptop for use onsite at recordings

One-off Misc

  • 15600$ Office Costs (1400$/person office cost at FAR labs monthly 6 months 2 persons)

  • 1776$ Proportional visa costs for 1 researcher for this time period

20% buffer

Total: $244k

Who is on the team and what's their track record on similar projects?

David ā€œdavidadā€ Dalrymple:


Lisa Thiergart: 

What are the most likely causes and outcomes if this project fails? (premortem)

The most obvious is that AIs don't make value judgements like humans do and this is a waste of time. It still seems well worth trying though.

What other funding is this person or project getting?

Probably some from Foresight since they are applying and we are in discussions with them. They donā€™t want to very actively spend time seeking grants since it is very time-consuming.

donated $110
7 months ago
donated $150
about 1 year ago
donated $15K
over 1 year ago
donated $15K
over 1 year ago