Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate

Funding requirements

Sign grant agreement
Reach min funding
Get Manifund approval
8

Ambitious AI Alignment Seminar

Technical AI safetyGlobal catastrophic risks
Mateusz-Bagiski avatar

Mateusz Bagiński

ProposalGrant
Closes February 24th, 2026
$500raised
$20,000minimum funding
$179,520funding goal

Offer to donate

37 daysleft to contribute

You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.

Sign in to donate

Project summary

We are going to gather ~35 exceptional people in the Hostačov Chateau in the Czech countryside for a five-weekend seminar running from March 13th to April 13th (albeit we may decide to move the starting date as late as May, if we do not secure sufficient funding in time). The seminar will focus on engaging those people with a large number of technical AI safety topics in order to let them develop a deep understanding of them. The topics in focus will be the ones that we judge to be likely important to understand for taking serious shots at superintelligence alignment.

The threshold of $179,520 constitutes the amount of money required to prepare and run the month-long seminar (budget breakdown below). Additional funding will allow us to extend the retreat into a year-long program: the AFFINE Fellowship, which will involve awarding grants to the ~10 most promising candidates and co-locating them in several places where they can receive relevant support to continue their learning and research for another 11 months (one such place is CEELAR / "EA Hotel").

What are this project's goals? How will you achieve them?

The primary goals of the seminar, as well as of the fellowship it may be extended to two are the following:

  • Get more people who can actually understand and think about the problem of AI alignment and AI X-risk in order to take a good shot at trying to build pieces of a solution.

  • Have more people who can properly explain the issue to governments in a way that is productive (instead of backfiring).

  • Have people who can start reasonably shaped orgs once funding is abundant (which we expect to happen later this year or early 2027 at the latest).

  • The problem at hand is very difficult, so we do not expect novel and promising research outputs within the time frame of the program. It would, however, be a very welcome surprise.

We will achieve these goals through a carefully designed month-long intensive that prioritizes deep technical learning within a collaborative rather than competitive environment. The program structure differs fundamentally from other AI safety fellowships by emphasizing community formation and peer learning alongside technical rigor.

The month unfolds through four distinct phases designed to maximize both intellectual depth and collaborative relationships. Week 1 focuses on community formation, with participants rotating through different small groups to build relationships across the entire cohort while beginning to engage with foundational technical material. Week 2 transitions to intensive technical engagement as participants self-select into stable working pods of three to five people for deeper collaborative work. Week 3 reaches peak intellectual intensity with sustained deep technical work in established pods. Week 4 integrates learning through presentations and reflection while preparing participants for either continuation into the year-long fellowship or transition to other impactful work.

Rather than passively consuming lectures, participants will share their learning with each other through structured showcases and peer instruction, which research shows produces dramatically better retention than traditional formats. (ETA: Learnings will by default be in the form of "I picked one of the topics listed as possibly important and read stuff/talked to people until I deeply get it and why it's a thing and can teach it", not necessarily novel research.) The Czech countryside setting removes urban distractions while providing space for both focused solo work and spontaneous collaboration. The program rhythm alternates between intensive technical engagement and explicit recovery time, preventing the burnout that plagues many month-long intensives. The design also accounts for predictable challenges—social overload, energy crashes, status competition—through structural choices rather than just good intentions.

Crucially, the selection for continuation into the year-long fellowship will happen because of collaborative excellence, not despite it. We're looking for participants who help others learn, who integrate across disciplines, and who build rather than hoard knowledge. The goal extends beyond producing ten individual researchers to creating a cohesive network that continues collaborating after the month ends, whether at CEEALAR or elsewhere.

Conditional on securing an additional $60k or more, the seminar will commence a year-long AFFINE fellowship. (See here for an explanation of why a 1-year-long fellowship is needed.)

How will this funding be used?

The first "valuable" (i.e., "we can use this money for something concretely useful in service of this project") threshold of $20k is meant to cover Mateusz's work on the retreat until getting the final decisions from our big funders on whether they finance the retreat and/or the fellowship (all in the scenario where funding for the retreat from other sources is not secured).

The second threshold of $179,520 will cover the work of Mateusz on preparing the retreat and will cover the costs of the retreat (including him and other staff).

The maximum amount of $1,616,120.00 will suffice to fund the entire Fellowship, the way we would see it ~ideally. The intermediate amounts will be used to cover as much of the Fellowship as we can. Roughly, less money will mean fewer fellows and/or smaller stipends. (We also provide a utility function over money made with plex's tool that you probably should also use in your funding applications.)

A detailed budget for the minimum amount is in the following table. Maximum funding amount budget can be made available upon private request.

Utility values in text are:

  • $20k -- 6% (Mateusz can keep working on this)

  • $130k -- 9%

  • $180k -- 44% (Seminar)

  • $240k -- 70% (Minimal Fellowship)

  • $1,616k -- 100% (Full Fellowship)

(ETA: Updated Full Fellowship cost from $1,561k to $1,616k in order to include some costs of the seminar that our earlier budget hadn't included.)

Who is on your team? What's your track record on similar projects?

Mateusz Bagiński - Lead (technical, applications)

Mateusz studied cognitive science (BSc, MSc) and worked as a programmer at a startup developing software for enhancing collective sense-making. Having dissertated, he decided to transition into technical AI safety research: upskilling, helping build AI Safety Info, and participating in some AI Safety hackathons. Eventually, he landed on theoretical/agent foundations research as the field that is most important, neglected, and suitable for his interests and skills. PIBBSS Fellow 2024 (w/ mentor Tsvi Benson-Tilsen (ex-MIRI)).

Mateusz will be responsible for designing the program, selecting the candidates, and ensuring that everything runs as smoothly as possible on the research side. The latter will involve helping the participants with their learning and research (acting as a sort of secondary mentor), making connections between participants and [mentors, resources, or other participants], as well as being generally on the lookout for ways in which the program could be improved.

Sofie Meyer - Humans Lead

Sofie's background is in cognitive neuroscience (BSc, PhD, postdoc, Google Scholar) and several experiential practices and trainings: ten years of Zen meditation, two years of existential psychotherapy training, six months of circling facilitation training and certification, five years supporting co-counselling courses, two years volunteering at Maytree Sanctuary, and three months of teaching cognitive behavioral therapy group facilitation skills at Rethink Wellbeing. She also facilitates Core Transformation, Focusing, and Internal Family Systems processes.

Professionally, she has led user research at two mental health tech startups, one focused on depression and tracking cognitive effects of medication, another on using cognitive behavioral therapy to treat social anxiety in working women. Currently, she designs AI chatbots for global health at Turn.io and serves as Chair of EA Denmark and board member of Giv Effektivt (LinkedIn)

She loves facilitating nuanced conversations and creating space and emotional safety to enable brilliant people to truth-seek. She aims to bring compassionate, well-regulated, honest, evidence-based support and tools to humans and teams navigating complex cognitive and emotional challenges.

Attila Ujvari - Event design

As Executive Director of CEEALAR, he's transforming a residential facility in Blackpool into a professionalized incubator for AI safety researchers and entrepreneurs working on GCR reduction. Over the past six months, he's revitalized the infrastructure, implemented productivity frameworks, and community systems that have dramatically improved resident outcomes.

Before CEEALAR, Attila spent 15+ years building systems that unlock human potential: managing cross-functional teams of 18+ at Ericsson, overseeing operations for 1,100+ soldiers across four continents in the Army National Guard, and scaling operational processes as Director of Operations at V School. He's taught professional courses, provided career counseling and academic planning in college, and tutored students navigating complex learning pathways.

His foundation in Hungary runs intensive hackathons that bring cross-disciplinary groups together around singular problems—exactly the dynamic needed here. As a group embodiment facilitator, he creates experiences that connect people not just professionally, but holistically.

He's not an AI safety researcher, but the person who builds the conditions for researchers to do their best work. This seminar needs someone who understands how to design intensive learning experiences, manage group dynamics at scale, and create the rhythms that turn ambitious people into effective collaborators.

DeAnza College, Stanford University, Amherst College.

TBD - Ops & Volunteer Lead

The venue provides food and basics, but we’ll want a full-time person to help make all the thousand minor things work. Probably assisted by volunteers.

plex - Vision & Network

plex has dedicated almost his entire adult life and the vast majority of his funds to trying to avert the AI apocalypse. The world is not anything like safe, so it’s insufficiently successful, but he has built or inspired many neat things, including a weirdly high fraction of the existential safety ecosystem’s infrastructure.

What are the most likely causes and outcomes if this project fails?

We actually consider it very likely that the project "fails" in the sense that it will complete with none of the Fellows producing any clearly promising research outputs or directions at building pieces of a solution. The reason/cause of this would be that the problem being tackled is one of great difficulty, very slippery, and with difficult feedback loops with reality.

However, even in that case, the three theories of change we outlined in the section above will still likely be achieved: we are going to have more people who can (1) think about the problem; (2) explain it to governments; (3) be able to start good technical AI X-risk-reducing orgs when funding becomes abundant.

The primary type of "disappointing failure" that we can foresee befalling this project would be the failure to produce promising individuals possessing a deep understanding of the alignment problem. The most likely causes of this would be the failure to recruit the right people and provide them the right sort of support (in terms of environment (including social) and mentorship).

In order to prevent this failure mode, we are going to do all of the following:

  1. Get a large pool of potentially useful mentors.

  2. Mateusz will be continuously assessing how the program is going for every participant.

  3. We will have a full-time employee specialized in working with humans (Sofie), so as to ensure that obstacles such as demotivation due to lack of clear results, emotional weight of the problem, or mental problems more generally, are not as much of a hindrance on the participants' journeys.

  4. We will utilize our extensive social networks, as well as high-quality paid services, to recruit highly promising individuals.

  5. We are going to use CEEALAR as a well-proven longer-term environment for researchers.

How much money have you raised in the last 12 months, and from where?

Zero. We just started.

We are in conversation with a donor who is potentially interested in funding the retreat (fully or partially). A partial function of this post is to gather public opinion of relevant people to make the donor better informed on the value of the endeavor being proposed here.

Additional info

Selection criteria for the fellows:

  • Highly technically skilled (e.g., maths, technical philosophy, finance, founder/CEO types, sharp PhDs/researchers in various fields, top-level science communication, etc)

  • Would care about saving the world and all their friends if they thought human extinction was likely.

  • Decent team players, non-disruptive to the group cohesion.

  • (Existing understanding of AI Safety is not required. Starting with a ~blank slate is fine and good.)

Comments11Offers1Similar8
abramdemski avatar

Abram Demski

about 10 hours ago

I am enthusiastic about this, and interested in being involved. I know Plex and Mateusz, and have some trust in their taste. I expect the program will focus on the most important issues (ie the most severe and neglected AI risks). I've been to a similar (much shorter) event at the venue, and found it to be a good location, with good countryside walks very conducive to thinking and conversation.

Richard avatar

Richard Ngo

1 day ago

Looks exciting. My personal view is that there's a lot of progress waiting to be made on theoretical/agent foundations research. The quality of the program will of course depend a lot on the quality of fellows; I'm curious if there are many people already on your radar, or if you think you have good leads there.

A few other thoughts:

- I think trying to persuade people that the alignment problem is hard is often counterproductive. The mindset of "I need to try to solve a extremely difficult problem" is not very conducive to thinking up promising research directions. More than anything else, I'd like people to come out of this with a sense of why the alignment problem is interesting. Happy to talk more about this in a call.

- Some of the selection criteria seem a bit counterproductive. a) "Decent team players, non-disruptive to the group cohesion" seems like a bad way to select for amazing scientists, and might rule out some of the most interesting candidates. And b) "would care about saving the world and all their friends if they thought human extinction was likely" seems likely to select mainly for EA-type motivations which IMO also make people worse at open-ended theoretical research. Meanwhile c) "highly technically skilled" is reasonable, but I care much more about clarity of thinking than literal technical skills.

If the organizers have good reasons to expect high-quality candidates I expect I'd pitch in 5-10k.

plex avatar

plex

about 17 hours ago

@Richard Good leads on how to get good leads (three people with good contacts/recruitment skills in relevant areas), some interested mentors, but have not yet started mass outreach until funding is locked in as I'd expect that to spoil more leads than it generates if we're like not confident it's happening.

  • Persuade it's hard is not the angle I'm hoping for, but I imagine they'll naturally conclude that by looking at a bunch of the info and topics. Agree interest/curiosity is great as a motivator.

  • Yeah, it's definitely possible to select out best candidates if you apply non-disruptive wrong. I mostly want to avoid people who are something like recklessly/incorrigibly disruptive or the closed/incurious kind of overconfident in a way that blocks good conversation and intellectual progress, while keeping the truth-seeking disagreeable and the weird genius with odd social norms.

  • I want to stand by wanting to select for people who would do something about it if they thought the world was ending. It doesn't have to be EA/altruistic motivations, selfish or caring about their friends is basically fine. But I think having ~everyone bought into a certain kind of ambition and taking this seriously rather than having a bunch of people with missing mood is pretty cruxy for getting the atmosphere and momentum that makes great things happen.

The people we've talked to for marketing seem reasonably confident they can get us high quality candidates, and have done similar-ish things before. This is probably the least certain part of the chain still, and it's not impossible that we have a too ambitious deadline for this and will notice that we're not on track for a sufficiently good crop by March and move to the later dates the venue is free, in May, to improve the participant quality.

🌽

Linda Linsefors

2 days ago

I endorce this project.

Context: I have some expereince in AI Safety organising, running AI Safety Camp and some other events. Becasue of my expereince, plex (who is one of the organsiers for this event) reached out to me for feedback on their plans. We had a one hour call. I came away from that call being entusiasticly in favour of this event happening. I also know both plex and Mateusz, and trust them to do a good job.

🌽

Linda Linsefors

1 day ago

Update: I just acctually read though what is written in the proposal here on Manifund. The plans I discussed with plex, and that I intended to endorce in my previous comment was significantly diffrent than the proposal wirtten here. I'm not sure why, but also it's normal for plans to change.

I'm less exited about the plans as written here, than the plans I heard previously. However I if the new plans are what's on the tabel, I still rather have this funded than not funded.

The biggest diffrence is that the previous plans I heard about did not have stable pods at all. Given what I understand this program to be about, I think having stable pods is a mistake.

plex avatar

plex

1 day ago

@Linda Yeah, my vision which I described in our call was not to have strongly stable pods but more of flexible-ish working groups, plus mixed interaction seminar-style on an ongoing basis, and the option for at least mentors to be around part-time.

I'll be exploring this with Attila, who wrote that section. He brings a ton of experience with relevant events and will be doing a lot of the event design and he's excited to make this awesome, but we've only partly synced on models of how to best do that. My guess is we end up having more working-group style group layout rather than strongly stable pods, but I'll be examining his reasons for having put this into the draft plan here.

My main reason for wanting flexibility over the usual benefits of having more fixed pods is that this seminar, unlike most similar events, is much more focused on gaining lots of existing knowledge and helping people rapidly grow than producing novel outputs. This means having more intermixing and people switching so they can tutor new people on the things they've collected is unusually beneficial, as opposed to the usual thing where you want to get deeply synced with a few people so you can push the boundaries of knowledge and do a project together.

In general, we're planning on iterating the details like this a fair amount as we approach the date, and have been keen to get this out ASAP so we've got longer with funding confirmed to start collecting candidates.

Lucius avatar

Lucius Bushnaq

2 days ago

I don't have the energy right now to write a high quality comment, but since I care about this I figure it's better to write something rather than nothing:

I think this project sounds like a good idea. Most (all?) AI Safety training programs these days don't even seem to touch on what I'd consider the actual core problems of alignment. I think there can be good reasons for many people and programs to mostly focus on other things at the moment, but it really seems almost catastrophically underemphasised at this stage. I don't know every training program of course, but talking to e.g. MATS graduates these days I often get the sense they haven't even really heard the basic case for why alignment might be hard. Looking at various AI Safety course curricular I likewise see an almost complete lack of material engaging with what I'd consider the core problems of alignment. If this continues, eventually I'm not sure this field will even really remember what it was supposed to be about, never mind try to work on it.

I know Mateusz a little. From our limited interactions, I got the impression that he probably knows at least a decent amount about the sort of old-school alignment thinking that I wish the alignment field today was a lot more familiar with. Tsvi's endorsement also means quite a bit to me here. I think he could make a good technical lead for this project. I don't think I know Sofie or Attila. I do know Plex, but haven't really worked with him professionally. Other people say he's good at what he does though. My guess is he'd be a good fit for this role.

Lorxus avatar

Paul Rapoport

2 days ago

This seems like an overall good idea, and I strongly recommend funding this to at least the 1-month Seminar level.

A few people have floated this kind of program to widen the funnel for people that might want to work on AI safety research with the hopes of kickstarting the involvement of people who otherwise might not know how to get working, or what concepts that existing researchers - even marginal ones - would find very basic.

A 1-month version would likely be best-in-class due to a relative lack of comparable programs - itself a problem! - though I think that a 1-year version might be overambitious and risk burning out or disengaging scholars if not done extremely well and carefully.

All the same, I've worked with Mateusz for a period of time and been part of a what turned into a very small category theory reading group with him, and I think he's very well-suited to this approach. AI safety - especially the kind of AI safety that looks like attempts to find a solution to alignment rather than a dozen ad-hoc patches to existing LLMs - suffers badly from a lack of serious research groups, and this project looks to me like it would be at least half as promising per person as MATS is and maybe 3/4 as promising as PIBBSS, both of which I've been a part of, which have similar mission statements, and which have been funded at higher levels - and both frequently claim that they want to see cousin orgs founded!

I would donate substantially to this if I had piles of tech or crypto money, but sadly I do not. I hope that other people who do have piles of tech or crypto money will hear me and donate in my place. If were a grantmaker I would almost certainly be directing grant funds to this endeavor.

tsvibt avatar

Tsvi Benson-Tilsen

3 days ago

Overall: I recommend funding this to at least ~$240K, the level needed for the Seminar + 1-year fellowship.

I researched AGI alignment at MIRI for about 7 years; in my judgement, the field is generally not well set-up to appropriately push newcomers to work on the important difficult core problems of alignment. Personally my guess is that AGI alignment is too hard for humans to solve at all any time soon. But, if I were wrong about that, I would probably still think that novel deep technical philosophy about minds would be a prerequisite. I'm not up to date, so this impression might be partly incorrect, but broadly my belief is that most AI safety training programs are not able to create a context where people have the space, and are spurred, to think about those core problems.

Since this program is new, it's hard to judge. I've worked with Mateusz on alignment research, and I think he gets the problem, and the description of the program seems around as promising as any I've seen. Because the space hasn't found great traction yet, trying new things is especially valuable. So, IF you want to fund AGI alignment research, this should probably be among your top investments.

Further, if you want to fund this program, I'd strongly recommend funding it at least to the minimum bar to continue it with the 1-year fellowship. The reason is that learning to approach the actual AGI alignment problem is a slow process that probably needs multiple years, with sparse but non-zero feedback; so the foundations laid down in the month-long seminar might tend to somewhat go to waste without longer-lasting scaffolding.

stable working pods of three to five people

I would suggest creating space for even smaller groups (the standard in Yeshiva, I gather, is pair study, and personally I need substantial time/space set aside for solo thinking). The area is very strongly inside-view-perspective thirsty, so an admixture of space for those to grow is needed, even given the opportunity cost. You could try to offload that to before and after the program, but I'd suggest also making space for it during. E.g. a "Schelling" time for 2 hour solo walks / thinks, or whatever.

We actually consider it very likely that the project "fails" in the sense that it will complete with none of the Fellows producing any clearly promising research outputs or directions at building pieces of a solution. The reason/cause of this would be that the problem being tackled is one of great difficulty, very slippery, and with difficult feedback loops with reality.

This is an unbelievably based statement, which on the object level would hopefully contribute to making an environment where actual new perspectives (rather than just the Outside the Box Box https://www.lesswrong.com/posts/qu95AwSrKqQSo4fCY/the-outside-the-box-box ) can grow, and furthermore indicates some degree of hopeworthiness of the organizers on that dimension.

participants will share their learning with each other through structured showcases and peer instruction

Sounds cool, but do keep in mind that this could also create a social pressure to "publish or perish" so to speak, leading to goodharting. A not-great solution is to make it optional or whatever; it's not great because it's sort of just lowering standards, and presumably you do want to have people aiming to work hard and do the thing. Maybe there are better solutions, such as somehow explicitly and in common knowledge making it "count for full points" to present on "here's how I have a really basic/fundamental question, and here's how I kept staring at that question even though it's awkward to keep staring at one thing and not have publishable technical results from that, and here's my thoughts in orienting to that question, and here's specifically why I'm not satisfied with some obvious answers you might give". Or something. In other words, alter the shape of the landscape, rather than making it less steep.

Selection criteria for the fellows:

I would suggest somewhat upweighting something like "security mindset", or (in the same blob), something like "really gets that you can have a plausible hypothesis, but it's wrong, and you could have quickly figured out that it's wrong by actually trying to falsify it / find flaws in it, but you probably wouldn't have quickly figured out that it's wrong just by bopping around by default". And/or trying to bop people on the head to notice that this is a thing, though IDK how to do that. This is especially needed because, since we don't get exogenous feedback about the objects in question, we have to construct our own feedback (i.e. logical reasoning about strong minds).

plex avatar

plex

3 days ago

@tsvibt
> Sounds cool, but do keep in mind that this could also create a social pressure to "publish or perish" so to speak, leading to goodharting.
Clarification: Learnings will by default be in the form of "I picked one of the topics listed as possibly important and read stuff/talked to people until I deeply get it and why it's a thing and can teach it", not necessarily novel research.

Kaarel avatar

Kaarel Hänni

4 days ago

The world really needs more and better places/programs where bright people can try to grow into serious AI alignment researchers. I would guess that the 1-year fellowship proposed above would become the best existing thing of this kind. The 1-month seminar would also probably be the best in its reference class.

Imo, almost all other alignment upskilling programs are mostly creating coding minions for ML labs and printers of ML conference slop, with little emphasis on creating people that can do novel interesting thinking about the AI problems we face. I expect that the proposed program would emphasize getting people to grow into serious alignment thinkers much more than almost any other existing program. (Among existing programs, the main exception that comes to mind is PIBBSS. PIBBSS is good.)

My main uncertainties are about stuff like whether the project will be run decently competently and whether finding applicants goes well — I don't know the organizing team etc. well enough to voice strong views on these sorts of things.

All things considered, I think supporting this is a great use of money.