Rubi Hudson

@Rubi-Hudson

Rubi Hudson: AI Safety Researcher, Economics PhD Student

https://www.alignmentforum.org/users/rubi-j-hudson

$0total balance

$0charity balance

$0cash balance

$0 in pending offers

About Me

Economics PhD student researching AI Safety at the University of Toronto. SERI MATS (Summer 2022) participant. Started the Mechanism Design for AI Safety (MDAIS) community. Mentor with SPAR and AIFF.

Projects

Avoiding Incentives for Performative Prediction in AI

Comments

Avoiding Incentives for Performative Prediction in AI

Rubi Hudson

over 1 year ago

Final report

I recently submitted a paper based on this research to an ML conference, wrapping up the project. There is no public version of the paper yet, but one will be released after the first acceptance/rejection decision, and an Alignment Forum post covering the topic will be released within two weeks.

The main results of this project are as follows:

- Formalized the preliminary results, including streamlining proofs and removing assumptions

- Demonstrated that the results hold for decisions based on average prediction, not just the most preferred prediction (as shown initially), which makes the process much easier to implement with a human decision maker

- Showed that it is possible to elicit honest predictions for all actions, not only the one actually chosen

- Proved uniqueness of the zero-sum setup for incentivizing the desired behavior

- Showed that the space of actions can be searched for the optimal action in O(1) time, not just O(log(n)) as per the preliminary result

- Avoided the cost of training additional models to implement zero-sum competition by instead using multiple dropout masks

- In the first major experiment, showed that the zero-sum setup avoids performative prediction, even in an environment that incentivizes it

- In the second major experiment, showed that the zero-sum setup trains performative prediction out of a model faster and more extensively compared to a stop-gradient through the choice of action

- Ran various robustness checks, including showing that the results hold even if predictors had access to different information

- Showed that decision markets can also be structured to avoid performative prediction (this result was cut from the submitted paper for space)

The only result that I was hoping to produce that was not accomplished was showing extending the mechanism to cases where different predictors have private information. However, this is much less urgent if it is being implemented as two masks of the same model, which have access to identical information. The experiments showed that private information does not make difference in the toy model. It is possible that I can develop a theoretical solution to private information in future work, but after having worked on the problem extensively I believe such a solution is unlikely to exist, at least without making unrealistic further assumptions.

Overall, I'm happy with the outcome of this project. While there is still room for follow-up work, I believe it presents a first-pass solution to the problem of performative prediction. Through the course of working on this project, I have also come to believe that being able to elicit honest conditional predictions will have further applications to safety beyond performative prediction, especially with respect to online training and myopia.

I will also address that this project took considerably longer than expected to complete. I had hoped to have SPAR mentees implement the experiments, but was unable to generate useful work from them. I consider this setback entirely my own fault, as I should not have counted on volunteer, part-time labor, especially for work beyond what I could implement myself. After deciding to the experiments myself, I took time to build up the necessary background, and so I do not anticipate this being a bottleneck in future work. A secondary reason for the delay is that I returned to my PhD after the first four months, which reintroduced other demands on my time.

This project was completed solo, although I benefited from discussions with Johannes Treutlein, editing from Simon Marshall, and code review from Dan Valentine.

Avoiding Incentives for Performative Prediction in AI

Rubi Hudson

over 1 year ago

Progress update

What progress have you made since your last update?

A draft paper containing results as of early February 2024 can be found here: https://drive.google.com/drive/u/1/folders/17X4YsCqsK6sEw2If8pv69A-JyAGDkq6K

Notable results

Formalized the model of the decision problem with multiple predictors, then updated initial proof sketches, including streamlining and removing assumptions
Proved uniqueness of the zero-sum scoring rule for honest predictions
Developed a variant decision rule that incentivizes honest predictions even for actions that will not be taken
Extended zero-sum scoring rule to decision markets, then found less restrictive methods to generate the same results

In terms of empirical results, progress has stalled. I mentored three junior researchers through SPAR, and was originally planning on having them run experiments under my supervision. This approach was unsuccessful, and I have restarted the process, running the experiments myself.

What are your next steps?

With respect to the theory, I just need to clean up some of the proofs, although there are a couple minor extensions I have in mind as well. In discussions with CS professors, I have been informed that modelling the situation where predictors have different information is sufficiently complex that it should be a follow-up paper. I expect to post to the Alignment Forum with the updated theory results when experimental results are also ready. From there, I will work on organizing the results into a paper.

With regards to the experiments, I expect these will take 1-2 more months. The experimental designs are already set, and have been run by others with ML expertise. The current bottleneck is my own ML skills, but I have been making some progress and this project is on track to wrap this up in a timely manner.

Transactions

For	Date	Type	Amount
Manifund Bank	almost 2 years ago	withdraw	200
Avoiding Incentives for Performative Prediction in AI	about 2 years ago	project donation	+50
Avoiding Incentives for Performative Prediction in AI	about 2 years ago	project donation	+100
Avoiding Incentives for Performative Prediction in AI	about 2 years ago	project donation	+50
Manifund Bank	about 2 years ago	withdraw	33000
Avoiding Incentives for Performative Prediction in AI	about 2 years ago	project donation	+33000