Retroactive funding for Don't Dismiss Simple Alignment Approaches

Technical AI safety

🍉

Chris Leong

Not fundedGrant

$0raised

Project summary

Retroactive funding for the Less Wrong post Don't Dismiss Simple Alignment Solutions.

I believe that this project is especially suitable for retroactive funding since:

• It was only published recently (the further back you go, the less value is delivered by retroactive funding)
• It's much easier to judge the post as a success retroactively than it would have been prospectively
• My runway is currently short enough that it is likely to lead to counterfactual AI Safety work taking place as opposed to just increasing a figure in a bank account.

This post wasn't a huge amount of work to write up, but producing posts of this EV regularly (say, one each month) would be extremely challenging. See, for example, the large number of posts that I had to write to get this one hit: https://www.alignmentforum.org/users/chris_leong, with more posts on Less Wrong (that said, many of those posts were on niche topics, so not intended to be hits). So it would likely make sense to view it as more of a grant for achieving a hit, rather than a grant for this one specific post.

What are this project's goals and how will you achieve them?

• Encourage more work of the sort in the post - either investigation of linearity or other ways it may be surprisingly easy to make progress on alignment.
• Encourage more people to give technical alignment a go, rather than just throwing up their hands.

Hopefully, the post has received enough attention to achieve these.

If there is > 2% chance that the post leads to someone producing a counterfactual result on the same level as those mentioned in the post, then I expect the post will be worth more than the amount requested.

How will this funding be used?

Project is already complete, so it would increase my runway.

Who is on your team and what's your track record on similar projects?

Irrelevant since a retroactive request. That said, I'm open to improving/augmenting the post if requested.

What are the most likely causes and outcomes if this project fails? (premortem)

Maybe this post was unnecessary as Beren already wrote a post "Deep learning models might be secretly (almost) linear" (https://www.lesswrong.com/posts/JK9nxcBhQfzEgjjqe/deep-learning-models-might-be-secretly-almost-linear). I think there's value in my framing (better optimised as a call for action).

Often posts get attention and then are forgotten.

Joseph Bloom suggests in a comment that not everyone might be excited about these alignment proposals as I am and that this may cause people to doubt the conclusion.

Perhaps all the low-hanging fruit has already been picked. Perhaps we already have directions to investigate and we should invest in them instead of trying to open up new directions.

Because of the stance (alignment is easier than you think, rather than harder) the post was more likely to have become popular and hence might be overrated.

What other funding are you or your project getting?

I received a grant to skill up in alignment and do agent foundations research, but this grant was from a certain well-known crypto fund that collapsed, so there is a chance that it may be clawed back. This was only a part-time grant anyway, so even if it were guaranteed I would need to try to find ways to supplement it.

joseph bloom

about 2 years ago

I wouldn't usually comment on other people's projects but I've been mentioned in the proposal and @Austin's response. Furthermore, I recently published some research which relates to many of the main themes in Chris's post (world models, steering vectors, superposition).

It's not obvious to me that more posts like these will lead to more good work being done. I don't think we are bottlenecked on ambitious, optimistic people and this post is redundant with others in terms of convincing people to be excited about these research outcomes.

I'd be keen on seeing more results of the kind discussed in the post but my prior on paying people to promote that work on LW being optimal funds use is low.

🍉

Chris Leong

about 2 years ago

Funnily, enough I was going to reduce my ask here, but I hadn't gotten around it yet, so now it may look like it's in response to this comment when I was going to do it anyway.

Austin Chen

about 2 years ago

Hi Chris! Thanks for posting this funding application. I generally am a fan of the concept of retroactive funding for impactful work (more so than almost anyone I know). However, TAIS isn't my area of specialty, and from where I'm standing it's hard for me to tell whether this specific essay might be worth eg $100 or $1000 or $10000. The strongest signals I see are the 1) relatively high karma counts and 2) engagement by @josephbloom on the article.

I'm putting down $100 of my budget towards this for now, and would be open to more if someone provides medium-to-strong evidence for why I should do so.

🍉

Chris Leong

about 2 years ago

Thanks so much for your support!

Oh, is the minimum locked once you create a post? I was tempted to move the minimum down to $700 and the ask down to $2000, but then again I can understand why you wouldn't want people to edit it after someone has made an offer as that is ripe for abuse.

In terms of why I'd adjust it: I'm trying to figure out what would actually motivate me to try to produce more of this content and not result in a bit of extra money in my pocket without any additional content production. I figure that if there's a 20% chance of a post being a hit, I'd need at least funding for a week* in order for it to be worthwhile for me to spend a full day writing up a post (as opposed to the half-day that this post took me).

In terms of the $2000 upper ask limit, I'm thinking it through as follow: It seems that if someone was able to write ten high-quality alignment posts in a year (quite beyond me at the moment, but not an inconceivable goal), then that'd work out at $20k, and it might be reasonable for writing such posts to be a third of their income.

(PS. I decided to do a quick browse of highly upvoted posts on the alignment forum. It seems that quite a high proportion of highly upvoted posts are produced by people who are already established researchers/phd students, such that if there was a funding scheme for hits** and that scheme was aiming to avoid double funding people, the cost would be less than it might seem).

Anyway, would be great if I could edit the ask, but no worries if you would like it to remain the same.

* My current burn rate is less b/c I'm trying really hard to save money, but this is a rough estimate of what my natural burn rate would be.
•• Couldn't be based primarily on upvotes because that would simply result in vote manipulation and distort people towards writing content that would receive upvotes.