Ambitious AI Alignment Seminar

Technical AI safety Global catastrophic risks

Mateusz Bagiński

ProposalGrant

Closes February 24th, 2026

$5,550raised

$5,000minimum funding

$179,520funding goal

Offer to donate

0 hoursleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Update (2026-02-16): Thanks to a private funder, we are now covered for the seminar. We are still fundraising for the subsequent 11 months of the fellowship.

Moreover, applications are now open!

If interested, apply here!

Project summary

We are going to gather ~35 exceptional people in the Hostačov Chateau in the Czech countryside for a five-weekend seminar running from April 28th to May 28th (updated dates) ~~from March 13th to April 13th (albeit we may decide to move the starting date as late as May, if we do not secure sufficient funding in time)~~. The seminar will focus on engaging those people with a large number of technical AI safety topics in order to let them develop a deep understanding of them. The topics in focus will be the ones that we judge to be likely important to understand for taking serious shots at superintelligence alignment.

The threshold of $179,520 constitutes the amount of money required to prepare and run the month-long seminar (budget breakdown below). Additional funding will allow us to extend the retreat into a year-long program: the AFFINE Fellowship, which will involve awarding grants to the ~10 most promising candidates and co-locating them in several places where they can receive relevant support to continue their learning and research for another 11 months (one such place is CEELAR / "EA Hotel").

What are this project's goals? How will you achieve them?

The primary goals of the seminar, as well as of the fellowship it may be extended to two are the following:

Get more people who can actually understand and think about the problem of AI alignment and AI X-risk in order to take a good shot at trying to build pieces of a solution.
Have more people who can properly explain the issue to governments in a way that is productive (instead of backfiring).
Have people who can start reasonably shaped orgs once funding is abundant (which we expect to happen later this year or early 2027 at the latest).
The problem at hand is very difficult, so we do not expect novel and promising research outputs within the time frame of the program. It would, however, be a very welcome surprise.

We will achieve these goals through a carefully designed month-long intensive that prioritizes deep technical learning within a collaborative rather than competitive environment. The program structure differs fundamentally from other AI safety fellowships by emphasizing community formation and peer learning alongside technical rigor.

The month unfolds through four distinct phases designed to maximize both intellectual depth and collaborative relationships. Week 1 focuses on community formation, with participants rotating through different small groups to build relationships across the entire cohort while beginning to engage with foundational technical material. Week 2 transitions to intensive technical engagement as participants self-select into stable working pods of three to five people for deeper collaborative work. Week 3 reaches peak intellectual intensity with sustained deep technical work in established pods. Week 4 integrates learning through presentations and reflection while preparing participants for either continuation into the year-long fellowship or transition to other impactful work.

Rather than passively consuming lectures, participants will share their learning with each other through structured showcases and peer instruction, which research shows produces dramatically better retention than traditional formats. (ETA: Learnings will by default be in the form of "I picked one of the topics listed as possibly important and read stuff/talked to people until I deeply get it and why it's a thing and can teach it", not necessarily novel research.) The Czech countryside setting removes urban distractions while providing space for both focused solo work and spontaneous collaboration. The program rhythm alternates between intensive technical engagement and explicit recovery time, preventing the burnout that plagues many month-long intensives. The design also accounts for predictable challenges—social overload, energy crashes, status competition—through structural choices rather than just good intentions.

Crucially, the selection for continuation into the year-long fellowship will happen because of collaborative excellence, not despite it. We're looking for participants who help others learn, who integrate across disciplines, and who build rather than hoard knowledge. The goal extends beyond producing ten individual researchers to creating a cohesive network that continues collaborating after the month ends, whether at CEEALAR or elsewhere.

Conditional on securing an additional $60k or more, the seminar will commence a year-long AFFINE fellowship. (See here for an explanation of why a 1-year-long fellowship is needed.)

How will this funding be used?

The first "valuable" (i.e., "we can use this money for something concretely useful in service of this project") threshold of $20k is meant to cover Mateusz's work on the retreat until getting the final decisions from our big funders on whether they finance the retreat and/or the fellowship (all in the scenario where funding for the retreat from other sources is not secured).

The second threshold of $179,520 will cover the work of Mateusz on preparing the retreat and will cover the costs of the retreat (including him and other staff).

The maximum amount of $1,616,120.00 will suffice to fund the entire Fellowship, the way we would see it ~ideally. The intermediate amounts will be used to cover as much of the Fellowship as we can. Roughly, less money will mean fewer fellows and/or smaller stipends. (We also provide a utility function over money made with plex's tool that you probably should also use in your funding applications.)

A detailed budget for the minimum amount is in the following table. Maximum funding amount budget can be made available upon private request.

Utility values in text are:

$20k -- 6% (Mateusz can keep working on this)

$130k -- 9%
$180k -- 44% (Seminar)
$240k -- 70% (Minimal Fellowship)
$1,616k -- 100% (Full Fellowship)

(ETA: Updated Full Fellowship cost from $1,561k to $1,616k in order to include some costs of the seminar that our earlier budget hadn't included.)

Who is on your team? What's your track record on similar projects?

Mateusz Bagiński - Lead (technical, applications)

Mateusz studied cognitive science (BSc, MSc) and worked as a programmer at a startup developing software for enhancing collective sense-making. Having dissertated, he decided to transition into technical AI safety research: upskilling, helping build AI Safety Info, and participating in some AI Safety hackathons. Eventually, he landed on theoretical/agent foundations research as the field that is most important, neglected, and suitable for his interests and skills. PIBBSS Fellow 2024 (w/ mentor Tsvi Benson-Tilsen (ex-MIRI)).

Mateusz will be responsible for designing the program, selecting the candidates, and ensuring that everything runs as smoothly as possible on the research side. The latter will involve helping the participants with their learning and research (acting as a sort of secondary mentor), making connections between participants and [mentors, resources, or other participants], as well as being generally on the lookout for ways in which the program could be improved.

Sofie Meyer - Humans Lead

Sofie's background is in cognitive neuroscience (BSc, PhD, postdoc, Google Scholar) and several experiential practices and trainings: ten years of Zen meditation, two years of existential psychotherapy training, six months of circling facilitation training and certification, five years supporting co-counselling courses, two years volunteering at Maytree Sanctuary, and three months of teaching cognitive behavioral therapy group facilitation skills at Rethink Wellbeing. She also facilitates Core Transformation, Focusing, and Internal Family Systems processes.

Professionally, she has led user research at two mental health tech startups, one focused on depression and tracking cognitive effects of medication, another on using cognitive behavioral therapy to treat social anxiety in working women. Currently, she designs AI chatbots for global health at Turn.io and serves as Chair of EA Denmark and board member of Giv Effektivt (LinkedIn)

She loves facilitating nuanced conversations and creating space and emotional safety to enable brilliant people to truth-seek. She aims to bring compassionate, well-regulated, honest, evidence-based support and tools to humans and teams navigating complex cognitive and emotional challenges.

Attila Ujvari - Event design

As Executive Director of CEEALAR, he's transforming a residential facility in Blackpool into a professionalized incubator for AI safety researchers and entrepreneurs working on GCR reduction. Over the past six months, he's revitalized the infrastructure, implemented productivity frameworks, and community systems that have dramatically improved resident outcomes.

Before CEEALAR, Attila spent 15+ years building systems that unlock human potential: managing cross-functional teams of 18+ at Ericsson, overseeing operations for 1,100+ soldiers across four continents in the Army National Guard, and scaling operational processes as Director of Operations at V School. He's taught professional courses, provided career counseling and academic planning in college, and tutored students navigating complex learning pathways.

His foundation in Hungary runs intensive hackathons that bring cross-disciplinary groups together around singular problems—exactly the dynamic needed here. As a group embodiment facilitator, he creates experiences that connect people not just professionally, but holistically.

He's not an AI safety researcher, but the person who builds the conditions for researchers to do their best work. This seminar needs someone who understands how to design intensive learning experiences, manage group dynamics at scale, and create the rhythms that turn ambitious people into effective collaborators.

DeAnza College, Stanford University, Amherst College.

TBD - Ops & Volunteer Lead

The venue provides food and basics, but we’ll want a full-time person to help make all the thousand minor things work. Probably assisted by volunteers.

plex - Vision & Network

plex has dedicated almost his entire adult life and the vast majority of his funds to trying to avert the AI apocalypse. The world is not anything like safe, so it’s insufficiently successful, but he has built or inspired many neat things, including a weirdly high fraction of the existential safety ecosystem’s infrastructure.

What are the most likely causes and outcomes if this project fails?

We actually consider it very likely that the project "fails" in the sense that it will complete with none of the Fellows producing any clearly promising research outputs or directions at building pieces of a solution. The reason/cause of this would be that the problem being tackled is one of great difficulty, very slippery, and with difficult feedback loops with reality.

However, even in that case, the three theories of change we outlined in the section above will still likely be achieved: we are going to have more people who can (1) think about the problem; (2) explain it to governments; (3) be able to start good technical AI X-risk-reducing orgs when funding becomes abundant.

The primary type of "disappointing failure" that we can foresee befalling this project would be the failure to produce promising individuals possessing a deep understanding of the alignment problem. The most likely causes of this would be the failure to recruit the right people and provide them the right sort of support (in terms of environment (including social) and mentorship).

In order to prevent this failure mode, we are going to do all of the following:

Get a large pool of potentially useful mentors.
Mateusz will be continuously assessing how the program is going for every participant.
We will have a full-time employee specialized in working with humans (Sofie), so as to ensure that obstacles such as demotivation due to lack of clear results, emotional weight of the problem, or mental problems more generally, are not as much of a hindrance on the participants' journeys.
We will utilize our extensive social networks, as well as high-quality paid services, to recruit highly promising individuals.
We are going to use CEEALAR as a well-proven longer-term environment for researchers.

How much money have you raised in the last 12 months, and from where?

Zero. We just started.

We are in conversation with a donor who is potentially interested in funding the retreat (fully or partially). A partial function of this post is to gather public opinion of relevant people to make the donor better informed on the value of the endeavor being proposed here.

Additional info

Selection criteria for the fellows:

Highly technically skilled (e.g., maths, technical philosophy, finance, founder/CEO types, sharp PhDs/researchers in various fields, top-level science communication, etc)
Would care about saving the world and all their friends if they thought human extinction was likely.
Decent team players, non-disruptive to the group cohesion.
(Existing understanding of AI Safety is not required. Starting with a ~blank slate is fine and good.)

🌳

Erik Leklem

11 days ago

@Mateusz-Baginski , as a somewhat newcomer to Manifund, but very interested in your seminar and fellowship vision for purposes of deepening in-person community building within the AI safety realm, may I ask a question: how do you see the seminar or the fellowship impacting future AI policy or governance outcomes or specific actions/changes? Thank you, and congratulations on the initial seminar funds being funded! ~ Erik

Mateusz Bagiński

12 days ago

Update: Thanks to a private funder, we are now covered for the seminar. We are still fundraising for the subsequent 11 months of the fellowship.

Moreover, applications are now open!

If interested, apply here!

Johannes C. Mayer

23 days ago

Having read most of the proposal it seems like you don't have a model about how to make people good at alignment research.

You write things like "deep technical engagement" . I expect you mean studying existing literature.

I expect this won't work. Or at least not work reliably. Part of what makes the alignment problem had is that we don't yet have a good model of the problem. To make progress on alignment you need to have the ability to notice your confusions, to think through your confusion, and to not give up until you have achieved some level of clarity.

You need to be able to handle situations where there is no obvious next step. You need to know how to pick up the problem and look at it from different angles until after some significant effort of analysis you actually manage to make progress, instead of dropping the ball early.

When I tried to teach people how to do alignment research The main problem I run into is that I don't manage to get them to seriously try. That is, to seriously try to solve the actually hard problems.

Either they get distracted by some cool tractible but ultimately inconsequential problem, or they try to run off reading the sequences, all of what Vanessa work, study math, etc.

Of course I'm not saying that reading the sequences, learning math, or reading other peoples work is inteinically bad. It's bad here because it's used as an escape mechanism. It's easier to study linear algebra than to try to make progress on alignment. Learning linear algebra might be hard but at least the path is clear.

There are probably many more important skills I didn't list, that are necessary to become an effective alignment researcher.

The problem: Your proposal doesn't even try to point at this set of skills at all. I expect you're not going to even try to teach this skill-set, because you can't, because you don't have a model of what they even are or how to teach them.

Now all that said, all else equal, I expect this project is good to do. It is just that I would be much more excited about the project if you could lay out clearly a model of what kind of mental procedures are required to make process on alignment, and how to entrain these procedures effectively.

This is a hard problem and I wish more people were to seriously think about it. Especially the people who run events like this, meaning any events with the goal of making people capable AI alignment researchers.

The only person I know who seriously thought about this and then tried to implement his model, on a semi-large scale, is John Wentworth. I think he got a lot right in his MATS program stream, but it also feels like he only tried to teach a fraction of the skills necessary.

Mateusz Bagiński

20 days ago

@johannesCmayer

I agree with you that maintaining a consistent focus on the problem, instead of falling for substitution hazards, thinking well when it's very non-obvious how to think well, etc., is difficult. It is not the case that we are unaware of those issues and are not going to teach them.

The Seminar and the following two months of the year-long Fellowship are focused on loading/understanding the problem as much as possible. Succeeding in this is a good start, and those participants who have demonstrated the greatest proclivity at picking up the key points are also most hopeworthy for being able to pick up the skills you're talking about over the subsequent 11 months. For those who continue learning about the problem and will want to do original research (whether within the Fellowship or outside of it), focusing first on loading what's already out there is helpful for knowing what paths of thinking predictably fail and why they predictably fail in making progress on the problem.

Regarding the research skills you're talking about, we are going to invest a lot of effort into trying to have them transmitted (be it in a legible or illegible format), although even then, we don't expect most people to succeed at acquiring them.

Aside from that, we expect the [loading/understanding the problem as much as possible] part to improve the discourse, policy, and help people not waste motion, even if many of the people will acquire relatively partial models of the problem (relative to what is possible).

Piotr Zaborszczyk

23 days ago

I see that the biggest cost would be the venue + food. In case you don't raise the full 179k, I wonder - would it be possible to organize a MVP / quick test of the idea in a cheaper venue? The cheapest would be EA hotel / Pause House, but idk how good it would be for your purposes.

Another idea would be to look for a sponsor who would host you in a big venue for free / for cheap. Maybe someone EA-related or maybe just someone with a history of hosting events for NGOs. E.g. I know a lovely private-owned venue near Cracow, where I went to a Jacek Kaczmarski festival twice (organized by the NGO "Kaczmarski Underground"). It was lovely in the summer, and you can swim & kayak there. https://www.google.com/maps/place/Gospodarstwo+Rybackie+Brze%C5%BAnica+Marcin+Orlanka/@49.9698821,19.6442588,662m/

Just a few quick-and-dirty ideas from me, not well thought out. Good luck with the project! :)

niplav

about 1 month ago

I will join the chorus of people who are like "oh man nobody is actually trying to, uh, solve the problem of pointing/constraining/steering/limiting vast amounts of optimization power, but instead we are spending resources on many things that are related" (monitoring the situation! finding ways to steer non-vast amounts of optimization power! measuring non-vast amounts of optimization power! building infrastructure for slightly less non-vast amounts of optimization power!). "It surely would be great if more people tried to figure out how to figure out strategies that even in concept work for vast overwhelming optimizers. Or build things that are not that but still useful enough as an alternative." This seems like the kind of event that would cause more thinking on alignment foundations to happen (not necessarily agent foundations, mind you, we might want to leave that line of retreat open).

I found this project so enticing that I considered foregoing tax benefits and donating a prospective 10% of my income to it (which would be fairly little, but what can you do).

I've interacted with Mateusz for about ten hours in total, and with plex for about 1½ hours, and found them to be good thinkers, plausibly better than me, maybe a bit more obscurantist. I feel like I can have a higher confidence in Mateusz than plex given the amount of public to-me-legible-output, but I may be misremembering as I'm shooting this comment from my hip.

I mean finally if it's another $200k to METR or to this I definitely choose this.

niplav

about 1 month ago

@niplav (I may still donate my part, depending on what other funders do (sorry about the game of chicken (but you know I really don't earn that much)))

And possible CoI I may apply to this project as a participant and/or mentor (‽¹).

¹: I have no background in any of the fellowship pinball games.

Anton Makiievskyi

about 1 month ago

Sounds like a good shot at creating new people working on alignment.

My concern is whether the structure of the program and how it's executed and adjusted along the way would be good. Much of it hinges on the people you can secure as mentors

Also, the quality of participants is extremely important of course, but I'm optimistic about it, although not if you plan the program for March.

Kudos for posting the utility curve and encouraging others to do so, however I'm confused how 1 fellow buys you 70-44=26% utility, and then extra 9 fellows buys you another 100-70=30%.

plex

about 1 month ago

@AntonMakiievskyi

We've got a good collection of mentors lined up now, will show you the table :)

Yep, agree, moving to May.

That curve would not do one fellow for 26% utility, it would be very minimal stipends and accommodation at the EA hotel for 3-7ish fellows, which would probably miss a lot of the best applicants and be less comfortable, but still provide a decent chunk of the value. I think on reflection, even with this budget cut down, the curve overestimates what we could do on a really tight budget.

Abram Demski

about 1 month ago

I am enthusiastic about this, and interested in being involved. I know Plex and Mateusz, and have some trust in their taste. I expect the program will focus on the most important issues (ie the most severe and neglected AI risks). I've been to a similar (much shorter) event at the venue, and found it to be a good location, with good countryside walks very conducive to thinking and conversation.

offering $5,000

Richard Ngo

about 1 month ago

Looks exciting. My personal view is that there's a lot of progress waiting to be made on theoretical/agent foundations research. The quality of the program will of course depend a lot on the quality of fellows; I'm curious if there are many people already on your radar, or if you think you have good leads there.

A few other thoughts:

- I think trying to persuade people that the alignment problem is hard is often counterproductive. The mindset of "I need to try to solve a extremely difficult problem" is not very conducive to thinking up promising research directions. More than anything else, I'd like people to come out of this with a sense of why the alignment problem is interesting. Happy to talk more about this in a call.

- Some of the selection criteria seem a bit counterproductive. a) "Decent team players, non-disruptive to the group cohesion" seems like a bad way to select for amazing scientists, and might rule out some of the most interesting candidates. And b) "would care about saving the world and all their friends if they thought human extinction was likely" seems likely to select mainly for EA-type motivations which IMO also make people worse at open-ended theoretical research. Meanwhile c) "highly technically skilled" is reasonable, but I care much more about clarity of thinking than literal technical skills.

If the organizers have good reasons to expect high-quality candidates I expect I'd pitch in 5-10k.

plex

about 1 month ago

@Richard Good leads on how to get good leads (three people with good contacts/recruitment skills in relevant areas), some interested mentors, but have not yet started mass outreach until funding is locked in as I'd expect that to spoil more leads than it generates if we're like not confident it's happening.

Persuade it's hard is not the angle I'm hoping for, but I imagine they'll naturally conclude that by looking at a bunch of the info and topics. Agree interest/curiosity is great as a motivator.
Yeah, it's definitely possible to select out best candidates if you apply non-disruptive wrong. I mostly want to avoid people who are something like recklessly/incorrigibly disruptive or the closed/incurious kind of overconfident in a way that blocks good conversation and intellectual progress, while keeping the truth-seeking disagreeable and the weird genius with odd social norms.
I want to stand by wanting to select for people who would do something about it if they thought the world was ending. It doesn't have to be EA/altruistic motivations, selfish or caring about their friends is basically fine. But I think having ~everyone bought into a certain kind of ambition and taking this seriously rather than having a bunch of people with missing mood is pretty cruxy for getting the atmosphere and momentum that makes great things happen.

The people we've talked to for marketing seem reasonably confident they can get us high quality candidates, and have done similar-ish things before. This is probably the least certain part of the chain still, and it's not impossible that we have a too ambitious deadline for this and will notice that we're not on track for a sufficiently good crop by March and move to the later dates the venue is free, in May, to improve the participant quality.

🌽

Linda Linsefors

about 1 month ago

I endorce this project.

Context: I have some expereince in AI Safety organising, running AI Safety Camp and some other events. Becasue of my expereince, plex (who is one of the organsiers for this event) reached out to me for feedback on their plans. We had a one hour call. I came away from that call being entusiasticly in favour of this event happening. I also know both plex and Mateusz, and trust them to do a good job.

🌽

Linda Linsefors

about 1 month ago

Update: I just acctually read though what is written in the proposal here on Manifund. The plans I discussed with plex, and that I intended to endorce in my previous comment was significantly diffrent than the proposal wirtten here. I'm not sure why, but also it's normal for plans to change.

I'm less exited about the plans as written here, than the plans I heard previously. However I if the new plans are what's on the tabel, I still rather have this funded than not funded.

The biggest diffrence is that the previous plans I heard about did not have stable pods at all. Given what I understand this program to be about, I think having stable pods is a mistake.

plex

about 1 month ago

@Linda Yeah, my vision which I described in our call was not to have strongly stable pods but more of flexible-ish working groups, plus mixed interaction seminar-style on an ongoing basis, and the option for at least mentors to be around part-time.

I'll be exploring this with Attila, who wrote that section. He brings a ton of experience with relevant events and will be doing a lot of the event design and he's excited to make this awesome, but we've only partly synced on models of how to best do that. My guess is we end up having more working-group style group layout rather than strongly stable pods, but I'll be examining his reasons for having put this into the draft plan here.

My main reason for wanting flexibility over the usual benefits of having more fixed pods is that this seminar, unlike most similar events, is much more focused on gaining lots of existing knowledge and helping people rapidly grow than producing novel outputs. This means having more intermixing and people switching so they can tutor new people on the things they've collected is unusually beneficial, as opposed to the usual thing where you want to get deeply synced with a few people so you can push the boundaries of knowledge and do a project together.

In general, we're planning on iterating the details like this a fair amount as we approach the date, and have been keen to get this out ASAP so we've got longer with funding confirmed to start collecting candidates.

Lucius Bushnaq

about 1 month ago

I don't have the energy right now to write a high quality comment, but since I care about this I figure it's better to write something rather than nothing:

I think this project sounds like a good idea. Most (all?) AI Safety training programs these days don't even seem to touch on what I'd consider the actual core problems of alignment. I think there can be good reasons for many people and programs to mostly focus on other things at the moment, but it really seems almost catastrophically underemphasised at this stage. I don't know every training program of course, but talking to e.g. MATS graduates these days I often get the sense they haven't even really heard the basic case for why alignment might be hard. Looking at various AI Safety course curricular I likewise see an almost complete lack of material engaging with what I'd consider the core problems of alignment. If this continues, eventually I'm not sure this field will even really remember what it was supposed to be about, never mind try to work on it.

I know Mateusz a little. From our limited interactions, I got the impression that he probably knows at least a decent amount about the sort of old-school alignment thinking that I wish the alignment field today was a lot more familiar with. Tsvi's endorsement also means quite a bit to me here. I think he could make a good technical lead for this project. I don't think I know Sofie or Attila. I do know Plex, but haven't really worked with him professionally. Other people say he's good at what he does though. My guess is he'd be a good fit for this role.

Paul Rapoport

about 1 month ago

This seems like an overall good idea, and I strongly recommend funding this to at least the 1-month Seminar level.

A few people have floated this kind of program to widen the funnel for people that might want to work on AI safety research with the hopes of kickstarting the involvement of people who otherwise might not know how to get working, or what concepts that existing researchers - even marginal ones - would find very basic.

A 1-month version would likely be best-in-class due to a relative lack of comparable programs - itself a problem! - though I think that a 1-year version might be overambitious and risk burning out or disengaging scholars if not done extremely well and carefully.

All the same, I've worked with Mateusz for a period of time and been part of a what turned into a very small category theory reading group with him, and I think he's very well-suited to this approach. AI safety - especially the kind of AI safety that looks like attempts to find a solution to alignment rather than a dozen ad-hoc patches to existing LLMs - suffers badly from a lack of serious research groups, and this project looks to me like it would be at least half as promising per person as MATS is and maybe 3/4 as promising as PIBBSS, both of which I've been a part of, which have similar mission statements, and which have been funded at higher levels - and both frequently claim that they want to see cousin orgs founded!

I would donate substantially to this if I had piles of tech or crypto money, but sadly I do not. I hope that other people who do have piles of tech or crypto money will hear me and donate in my place. If were a grantmaker I would almost certainly be directing grant funds to this endeavor.

Tsvi Benson-Tilsen

about 1 month ago

Overall: I recommend funding this to at least ~$240K, the level needed for the Seminar + 1-year fellowship.

I researched AGI alignment at MIRI for about 7 years; in my judgement, the field is generally not well set-up to appropriately push newcomers to work on the important difficult core problems of alignment. Personally my guess is that AGI alignment is too hard for humans to solve at all any time soon. But, if I were wrong about that, I would probably still think that novel deep technical philosophy about minds would be a prerequisite. I'm not up to date, so this impression might be partly incorrect, but broadly my belief is that most AI safety training programs are not able to create a context where people have the space, and are spurred, to think about those core problems.

Since this program is new, it's hard to judge. I've worked with Mateusz on alignment research, and I think he gets the problem, and the description of the program seems around as promising as any I've seen. Because the space hasn't found great traction yet, trying new things is especially valuable. So, IF you want to fund AGI alignment research, this should probably be among your top investments.

Further, if you want to fund this program, I'd strongly recommend funding it at least to the minimum bar to continue it with the 1-year fellowship. The reason is that learning to approach the actual AGI alignment problem is a slow process that probably needs multiple years, with sparse but non-zero feedback; so the foundations laid down in the month-long seminar might tend to somewhat go to waste without longer-lasting scaffolding.

stable working pods of three to five people

I would suggest creating space for even smaller groups (the standard in Yeshiva, I gather, is pair study, and personally I need substantial time/space set aside for solo thinking). The area is very strongly inside-view-perspective thirsty, so an admixture of space for those to grow is needed, even given the opportunity cost. You could try to offload that to before and after the program, but I'd suggest also making space for it during. E.g. a "Schelling" time for 2 hour solo walks / thinks, or whatever.

We actually consider it very likely that the project "fails" in the sense that it will complete with none of the Fellows producing any clearly promising research outputs or directions at building pieces of a solution. The reason/cause of this would be that the problem being tackled is one of great difficulty, very slippery, and with difficult feedback loops with reality.

This is an unbelievably based statement, which on the object level would hopefully contribute to making an environment where actual new perspectives (rather than just the Outside the Box Box https://www.lesswrong.com/posts/qu95AwSrKqQSo4fCY/the-outside-the-box-box ) can grow, and furthermore indicates some degree of hopeworthiness of the organizers on that dimension.

participants will share their learning with each other through structured showcases and peer instruction

Sounds cool, but do keep in mind that this could also create a social pressure to "publish or perish" so to speak, leading to goodharting. A not-great solution is to make it optional or whatever; it's not great because it's sort of just lowering standards, and presumably you do want to have people aiming to work hard and do the thing. Maybe there are better solutions, such as somehow explicitly and in common knowledge making it "count for full points" to present on "here's how I have a really basic/fundamental question, and here's how I kept staring at that question even though it's awkward to keep staring at one thing and not have publishable technical results from that, and here's my thoughts in orienting to that question, and here's specifically why I'm not satisfied with some obvious answers you might give". Or something. In other words, alter the shape of the landscape, rather than making it less steep.

Selection criteria for the fellows:

I would suggest somewhat upweighting something like "security mindset", or (in the same blob), something like "really gets that you can have a plausible hypothesis, but it's wrong, and you could have quickly figured out that it's wrong by actually trying to falsify it / find flaws in it, but you probably wouldn't have quickly figured out that it's wrong just by bopping around by default". And/or trying to bop people on the head to notice that this is a thing, though IDK how to do that. This is especially needed because, since we don't get exogenous feedback about the objects in question, we have to construct our own feedback (i.e. logical reasoning about strong minds).

plex

about 1 month ago

@tsvibt
> Sounds cool, but do keep in mind that this could also create a social pressure to "publish or perish" so to speak, leading to goodharting.
Clarification: Learnings will by default be in the form of "I picked one of the topics listed as possibly important and read stuff/talked to people until I deeply get it and why it's a thing and can teach it", not necessarily novel research.

Kaarel Hänni

about 1 month ago

The world really needs more and better places/programs where bright people can try to grow into serious AI alignment researchers. I would guess that the 1-year fellowship proposed above would become the best existing thing of this kind. The 1-month seminar would also probably be the best in its reference class.

Imo, almost all other alignment upskilling programs are mostly creating coding minions for ML labs and printers of ML conference slop, with little emphasis on creating people that can do novel interesting thinking about the AI problems we face. I expect that the proposed program would emphasize getting people to grow into serious alignment thinkers much more than almost any other existing program. (Among existing programs, the main exception that comes to mind is PIBBSS. PIBBSS is good.)

My main uncertainties are about stuff like whether the project will be run decently competently and whether finding applicants goes well — I don't know the organizing team etc. well enough to voice strong views on these sorts of things.

All things considered, I think supporting this is a great use of money.