Manifund foxManifund
Home
Login
About
People
Categories
Newsletter
HomeAboutPeopleCategoriesLoginCreate
🐙
🐙
Jonas Becker

@jmb

$200total balance
$0charity balance
$0cash balance

$200 in pending offers

Outgoing donations

Integral Altruism
$200
PENDING
CaML - AGI alignment to nonhumans
$300
9 days ago

Comments

CaML - AGI alignment to nonhumans
🐙

Jonas Becker

9 days ago

TL;DR
The "alignment problem" may map surprisingly well onto the problem of raising a child to not become Elon Musk — you can't control them into goodness, but you can have a good shot at it by being a good parent. Here's why I think compassion-first pre-training might be one of the most underexplored routes in AI alignment. Plus some armchair reasoning from an animal welfare advocate who's been thinking about trauma, behavior change, and the metacrisis. Thank you Claude for writing this Tldr.

---

The approach of pre-training AI's on compassion versus "fine-tuning" them later in the alignment process maps well with a piece of writing that I haven't published yet. I basically argue that if we assume artificial intelligence is not fundamentally different from human intelligence, we might as well model "AI models in training" as children (with different stages from infancy to adolescence). Then we could transfer all the science on pedagogy and the lived experience of every good parent to find out which kind of education would lead to them becoming the most helpful and least harmful participants of society. And maybe also psychotherapy to avoid them of becoming Mecha-Hitler.

A good kindergarden teacher would model compassion. They would make sure the kids are not constantly exposed to content displaying violence, hate speech, zero sum winner-takes-all games and pornography. Training LLMs on "the whole internet, period" is quite the opposite of that and may even "traumatize" them. I have no idea how rigid content selection for AI training currently is and whether there is a gradual process of exposing them to more "challenging" material through their stages of growth employed.

A good high school teacher would actually not even punish their students for making mistakes, they do not act out of an 1800s mindset of kids being inherently bad/dangerous or "having to align/civilize these barbaric kids", using extortion and even hitting them with sticks if the forget their vocabularies or hit other kids (to "provide incentives"). Yes, setting boundaries but in a way that is encouraging and redirecting their energy, strengthening their own moral compass instead of imposing certain types of rules and behaviours.

I currently dont know almost anything about current "fine-tuning", EVALs and technical alignment work but the feeling I get from there is that it might not be using that approach and mindset. And I'm not saying I am sure that's the way to go but definitely an avenue worth exploring. Well - actually, somehow my intuition suggests quite strongly that may be the way to go, to be honest.

Maybe the "alignment problem" is inherently insolvable. At least for humans: I postulate you cannot control another human intelligence to the extent that you are 100% sure they will never harm someone else. Because doing so will somehow limit or harm that very human being, and that in turn models harmful behaviour to them. You cannot force your child to "not become Elon Musk" - but you can have a very good shot at it by trying to be a good parent.

My prior, without doing any research into the topic or without being very informed on technical AI alignment, is that this seems to be one of the most promising routes for alignment of AIs in general and towards non-human interests. I don't know the team behind this at all. If they seem at least somewhat capable of delivering what they're aiming at and this has some chance of influencing how AI training is done at the labs, that experiment should be worth a couple hundred thousand dollars at this point.

My background is in corporate animal welfare advocacy, how to convince antagonistic actors to change their behavior for the better. And for what it's worth I argued for AGI being entirely possible and likely in our lifetime in my first ever high school paper in 2010.

I'm also currently writing a blog post series on my understanding of the meta crisis, the mindsets and cultural patterns behind the AI race, climate change, the patriarchy, and human exploitation of animals. While I will admit that these musings might seem armchair-y and somewhat uninformed re the AI alignment space, at the same time they are rooted in over ten years of experience bridging a variety of realms, including not only advocacy and EA but also non-dual and jhana meditation, NARM psychotherapy, embodiment, and emotions work.

Compassion Bench
🐙

Jonas Becker

22 days ago

I just stumbled across this. I didn't know it existed. I think it's a great idea and should keep going. I believe influencing the moral circle of AI's as they become more and more powerful is maybe one of the highest leverage opportunities in farm animal welfare. Whether you believe in AGI take-off or not, it seems clear to me that AIs will be responsible for a huge number of decisions. I think it's crucial for them to include non-human sentient beings into their decision making.

I'm saying this as someone who has dedicated the last eight years of their career and is still doing corporate cage-free, broiler and shrimp campaigns. I would want to see a lot more advocacy efforts like these directed at AI labs please.

Integral Altruism
🐙

Jonas Becker

22 days ago

I believe we need more spaces for EAs to think outside of the EA box. After spending eight years in EA and learning a lot more about therapy, collective trauma, and perspectives from different wisdom traditions and therapeutic methods such as NARM, I believe we can't solve today's multiple crises within the same paradigm that caused them. We need to develop a new paradigm. I hope IntA can help facilitate this through conferences, hackathons, and bringing people together.

Transactions

ForDateTypeAmount
CaML - AGI alignment to nonhumans9 days agoproject donation300
Manifund Bank22 days agodeposit+500