Deleted

Technical AI safety

🌴

FazlBarez

Not fundedGrant

$0raised

Project summary

Deleted

What are this project's goals and how will you achieve them?

Deleted

Rachel Weinberg

over 2 years ago

@mapmeld @vincentweisser @esbenkran @JoshuaDavid since Evan withdrew his donation which put this project back below the minimum funding bar, I put this project back in the proposal stage and undid your transactions. Let me know if any of you want to withdraw your offers to. Otherwise they'll only go through if/when this reaches its minimum funding bar ($1k) again.

🍋

Jonas Vollmer

over 2 years ago

lol at the timing of the events here

Austin Chen

over 2 years ago

(for context: Jonas posted his reservations independent of my grant approval, and within the same minute)

🍋

Jonas Vollmer

over 2 years ago

Edit: comment retracted

🌴

FazlBarez

over 2 years ago

Thank you for your input, Jonas. I'm interested to understand the nature of your significant reservations. One of the appealing aspects of Manifund is its decentralized structure, which encourages open dialogue. This helps to counteract the traditional system where funding often depends on personal networks and reciprocal favors.

Would you be willing to share more details privately, given your concerns about public disclosure?

Austin Chen

over 2 years ago

In light of Jonas's post and the fact that this grant doesn't seem to be especially urgent, I'm going to officially put a pause on processing this grant for now as we decide how to proceed. I hope to have a resolution to this before the end of next week.

Some thoughts here:

We would like to have a good mechanism for surfacing concerns with grants, and want to avoid eg adverse selection or the unilateralist's curse where possible
- At the same time, we want to make sure our regrantors are empowered to make funding decisions that may seem unpopular or even negative to others, and don't want to overly slow down grant processing time.
We also want to balance our commitment to transparency with allowing people to surface concerns in a way that feels safe, and also in a way that doesn't punish the applicant for applying or somebody who has reservations for sharing those.

We'll be musing on these tradeoffs and hopefully have clearer thoughts on these soon.

🍋

Jonas Vollmer

over 2 years ago

@kiko7 I've sent you an email with feedback. You have my permission to share the email publicly on here if you would like to.

🌴

FazlBarez

over 2 years ago

@Jonas-Vollmer I have responded to your email. After you have reviewed the email, we can then evaluate whether to make it public.

🍋

Jonas Vollmer

over 2 years ago

@kiko7 I don't have the time to engage further via email. Happy for you to decide for yourself!

🍋

Jonas Vollmer

over 2 years ago

@kiko7 I'm also fine with Manifund deleting all my public comments here in case others think they're too damaging or unfair given that I don't have the time to engage any further.

🌴

FazlBarez

over 2 years ago

@Austin I'd like to request that Jonas's comment be temporarily removed, because the substance in the email Jonas sent is not nearly as concerning as the negative impact of the public comment. Once we have resolved the issues raised in the private email, we can consider reposting the comment, either as is or in a modified form.

Austin Chen

over 2 years ago

I've updated Jonas's comment above. Evan is also retracting his support for this grant, so we will be unwinding his $50k donation and restoring this project to be in the pending state.

🍄

Mindermann

over 2 years ago

I'm responding to the concerns Jonas raised with Fazl in a private email shared below, with permission.

Overall, these concerns don't seem substantial enough to warrant 'significant reservations' or even 'permanent repercussions'. I'm writing this because in my view these phrasings from Jonas will give a wrong impression (unless the email I quote below in full omits some important information) and could affect Fazl's reputation and future opportunities unnecessarily. And indeed it has led at least one person to put their funding on ice.

Disclosure: Fazl was my house mate. I've known Jonas for ~10 years.

My concerns are
1. In our conversations, I got the impression that you were primarily trying to impress me and create an impression that we’re buddies, rather than e.g. discussing ideas on the object level. I also thought you tried to create the impression that you knew much more about AI alignment than you actually do. I don't think any of this makes your clearly unsuitable for this grant, but I've learned over the years as a manager/grantmaker that these signs tend to be good predictors of funders overestimating grantees, and their projects not working out as planned.

Regarding Fazl's technical skill, I think the view of a non-expert who has had one or two casual conversations with Fazl (apparently while waiting for food at a takeaway) shouldn't be reason for 'significant reservations' or 'permanent repercussions' as stated in the original. Fazl has endorsements from accomplished ML researchers like David Krueger and a strong publication record that the grant evaluators here can check, plus at least two unpublished accepted NLP publications. (I can attest to his strong technical skill as well). These factors seem substantially more important.

Regarding 'trying to impress', I think for academic grants it shouldn't be a primary evaluation criterion whether the grantee is trying to impress. Many productive academics do this. Additionally, Jonas seems worried that Fazl would impress the funders on this website, but that seems not relevant here: Fazl's whole interaction with them is shown on this website and if there is any undue impressing persuasion happening here it could be pointed out publicly.

2. On top of that, I heard some negative stories (haven’t verified them myself): E.g. that you tried to get into an office space you were repeatedly asked to leave, that a project you ran went poorly and you blamed your collaborators when it was clearly your responsibility, and another significantly negative story.

Jonas has told me which office space this was but not which project or other story. For the office space, I've read the email thread with the office ops and it looks very much like a miscommunication. Fazl was certainly not told to leave in these emails. According to Fazl, that also didn't happen in person. According to the emails, members of the office space invited him inside twice and it seems this was not following the protocol expected by the admin. But from the emails, the protocol also seemed a bit ambiguous and I'm still not sure what exactly was expected.

Given what I know about the office space story, which is the only where I have some insight, and given that Jonas says he hasn't verified any of the stories, I also have some doubts about the other stories now.

Fazl doesn't know what the 'other significantly negative story' is or the project in question. FWIW Fazl works on and supervises a lot of projects at once and it's normal if one goes poorly once in a while (and sometimes it's actually someone else's fault, though Fazl says he doesn't recall blaming anyone for a project failure).

3. I think it’s fine not to have a fleshed-out research agenda, but I’d at least like to see some specific preliminary ideas, which are often good indicators of whether the research will be good.

Grant evaluators can easily see the research proposal, so this should be (and has been) discussed in public and doesn't need to be part of a negative message without clearly stated concerns.

This is my impression based on Jonas' email. There might be omitted info I don't know about.

I'm not planning to engage further, just dropping my 2 cents :)

🍋

Jonas Vollmer

over 2 years ago

Just one nitpick in response to that: I hope it was clear that I was aiming to avoid 'permanent repercussions' for Fazl, rather than argue for them. I continue to have 'significant reservations'.

🍋

Jonas Vollmer

over 2 years ago

@Austin Would you mind changing my comment from "comment retracted" to "comment removed with author's consent"? 'Retracted' implies I don't endorse it anymore, but I asked for it to be removed as a favor to Fazl, rather than changing my opinion.

🍋

Jan Brauner

over 2 years ago

I mostly agree with what Mindermann wrote.

CoIs: Fazl is my housemate; Jonas was (briefly) my house mate in the past.

Austin Chen

over 2 years ago

Approving this project! It's nice to see a handful of small donations coming in from the EA public, as well as Evan's endorsement; thanks for all your contributions~

🐌

Joshua David

over 2 years ago

Establishing an AI safety lab at Oxford seems like a good idea in general, and I expect that research which focuses on mechanistic interpretability is particularly likely to yield concrete, meaningful, and actionable results.

Additionally, Fazl has a track record of competence in organizational management, as shown by his contributions to Apart Lab and his organizational work for the Alignment Jam / Interpretability Hackathon.

Disclaimer: My main interactions with Fazl, and my impressions above, were through Interpretability Hackathon 3 and subsequent discussions, and that is how I heard about this manifund.

Disclaimer: I do not specialize in grant-making in an impact market context - my donation should be interpreted as an endorsement of an AI safety lab existing at Oxford being a net positive, not as an intentional bid to change market prices.

Renan Araujo

over 2 years ago

Interesting project! I'm curious about a couple of things:

What would the research agenda be like, most likely? (Eg what you think would be the most exciting version and the realistic version)
How many people do you expect would work on that agenda and what would their background be on? (Eg would they already have an alignment-related background, just technical folks interested in the field, PhD students or faculty, etc)

🌴

FazlBarez

over 2 years ago

Thank you for the thoughtful questions, Renan.

1- The research agenda is still in formation as a key goal over the next 3-4 months is to further shape the directions and priorities and secure funding from the identified sources. However, I envision a significant portion focusing on interpretability, particularly interpreting reward models learned via reinforcement learning. Additional areas will likely include safe verification techniques, aligning with much of Stuart Russell's work as well as the expert areas of Phil and David.

2- Regarding team composition, we expect at least two existing research fellows to be involved and several PhD students to be hired. Most members will have strong technical backgrounds and solid foundational knowledge in AI alignment literature. We aim to assemble a diverse team with complementary strengths to pursue impactful research directions.
Please let me know if you have any other questions! I'm excited by the potential here and value your perspective.

Renan Araujo

over 2 years ago

@kiko7 thanks! Ultimately, I decided to not evaluate this since I don't feel confident about having the right background for that. I incentivize others with a more technical background to evaluate this grant.

🍉

Chris Leong

over 2 years ago

I would be really excited to see the establishment of an AI safety lab at Oxford as this would help establish the credibility of the field which is one of the core problems holding alignment research back.

That said, I suspect that a proper research direction is crucial when establishing a new lab as its important to lead people down promising paths. I haven’t evaluated their proposed directions in detail, so I would encourage anyone considering donating large amounts of money to do so themselves.

Disclaimer: Fazl and I were discussing collaborating on movement building in the past.

Esben Kran Christensen

over 2 years ago

This seems like a high-EV project and working with FB in Apart Research, I have been impressed by his work ethic and commitment to real impact. One of my worries in the establishment of a new lab is that it could get caught in producing low-impact research, but with him at the helm and the support of Krueger, there is little doubt that this lab will take paths towards concrete efforts to reducing existential risk from AI.

Additionally, the support of the Torr Vision Group provides the credibility and support that other new labs would need to build up over a longer period of time, potentially speeding up the path to impact for the proposed project. I do not specialize in grant-making and provide this donation as a call-to-action for other grant-makers to support the project.