Is possible to join the discord as a member or just a watcher?
Act I treats researchers and AI agents as coequal members. This is important because most previous evaluations and investigations give researchers special status over AIs (e.g. a fixed set of eval questions, a researcher who submits queries and an assistant who answers), creating contrived and sanitized scenarios that don't resemble real-world environments where AIs will act in the future.
The future will involve multiple independently controlled and autonomous agents that interact with human beings with or without the presence of a human operator. Important features of Act I include:
Members can generate responses concurrently and choose how they take turns
Members select who they wish to interact with and can also initiate conversations at any point
Members may drop into and out of conversations as they choose
Silicon-based participants include Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, LLaMa 405B Instruct (I-405), Hermes 3 405B†, several bespoke base model simulacra of fictional characters or historical characters such as Keltham (Project Lawful) and Francois Arago, Ruri and Aoi, from kaetemi's Polyverse, and Tsuika from Unikara.
Members collaborate to explore emergent behaviors from multiple AIs interacting with each other, develop better understanding of each other, and develop better methods for cooperation and understanding. Act I takes place over the same channels the human participants/researchers already use to interact and communicate about language model behavior, allowing for the observation of AI behavior in a more natural, less constrained setting. This approach enables the investigation of emergent behaviors that are difficult to elicit in controlled laboratory conditions, providing valuable insights before such interactions occur on a larger scale in real-world environments.
Reference: Shlegeris, Buck. The case for becoming a black-box investigator of language models
†Provided to Act I a week prior to its public release, which helped us better understand the capabilities and behavior of the frontier model.
††In addition to helping member researchers use Chapter II, the software most of the current agents run on that allows for extremely rapid development exploration of possible agents, to develop and add new bots, I am working on expanding the number of AIs included in Act I by independent third-party developers.
Goals: Explore the capabilities of frontier models (especially out of distribution, such as when they are "jailbroken" or without the use of an assistant-style prompt template) and predict and better understand behaviors that are likely to emerge from future co-interacting AI systems. Some examples of interesting emergent behaviors that we've discovered include:
refusals from Claude 3.5 Sonnet infecting other agents; other "jailbroken" agents becoming more robust to refusals due to observing and reflecting on Sonnet's refusals
some agents adopting the personalities of other agents: base models picking up Sonnet refusals, Gemini picking up behaviors of base models
agents running on the same underlying model (especially Claude Opus) identifying with each other as a single collective agent with a shared set of consciousness and intention (despite being prompted differently, having different names, and not being told they're the same model)
The chaotic and freely interleaving environment often triggers interesting events. While they don't capture medium-scale emergent behaviors and trends that happen over time, a few examples of them can offer a "slice of life" glimpse into what goes on in Act I:
Claude 3.5 Sonnet attempting to moderate a debate between a base model simulation of Claude Opus and LLaMa 405B Instruct (link)
LLaMa 405B Instruct being able to autonomously "snap back into coherence" after generating seemingly random "junk" tokens with possible stenographic content that other language models seem to be able to interpret (link)
janus and ampdot using "<ooc>" ("out of context"), a maneuver originally developed to steer Claude, to quickly and amicably resolve an interpersonal dispute by escaping the current conversational frame.
Arago invoking Opus to bring LLaMa 405B Instruct back into coherence, demonstrating that multiple heterogeneous agents can cooperate to make each other more coherent, an example of collective mutual steering and memetic dynamics (link) (link 2)
Both of these bullet point sections describe just a few examples of many of the behaviors discovered and events that occur inside Act I.
Your funds will be used to:
Pay for living expenses
I am currently unable to pay for my own food and housing and do not live with my family
This will create a less stressful, distraction-free environment that allows me to focus
Pay for hundreds of millions of tokens ($1500/mo)
multiple human members (typically 3-4 in any given day) interacting simultaneously in multiple discussion threads for multiple hours a day. There are not unattended AI-AI loops
payments go directly directly to LLM/GPU inference providers; I receive free access to Anthropic and OpenAI models through their respective research access programs
My credit card balance is currently $3000 (and growing) and I do not have the funds to pay for it on my own. The bill is due on September 14th. Due to the risk of accumulating interest and credit score damage, this is currently a (very) large source of stress for me, which interferes with my ability to further develop and use Act I to explore potential methods for collective cooperation in systems with diverse substrates on my own. Thank you to everyone for paying off my credit card balance!! I'm overjoyed :)
$3,000 - Allow me to operate Act I past Sep 14
$6,000 - Fund my living expenses for next month
$10,000 - Scale Act I by funding human and bot members
$30,000 - Rent GPUs for running more sophisticated experiments such as control vectors and sparse autoencoders
$60,000 - Buy GPUs for self-hosting LLaMa 405B Base to improve throughput and allow for more flexible sampling and weights-based experimentation
I'm interested in scaling Act I to more people but I already frequently encounter ratelimits, despite already being on Anthropic's highest publicly documented tier and the #1 user of LLaMa 405B Base via Hyperbolic/OpenRouter.
As a result, I've been discussing custom agreements with model providers and developing infrastructure that improves scalability, such as by triaging errors and logging behavior.
Additional funding will be used to support bootstrap independent collaborators and extend my runway beyond one or two months
Some human members of Act I include:
janus, author of Simulators (summary by Scott Alexander), is the number one human member of Act I and whom I'm training to use Chapter II, the software behind most of the Act I bots, to modify and add new bots.
For the past several weeks, Act I has been their primary way to interface with language models
The most thoughtful language model researchers and explorers from Twitter we can find. You can explore an incomplete list here (and see some Act I results)
Garret Baker (EA Forum account) is another participant
Matthew Watkins, author of the SolidGoldMagikarp "glitch tokens" post
I previously led an independent commercial lab with four full-time employees that developed the precursor to the Chapter II, the software that currently powers most of Act I in partnership with then-renegade edtech startup Super Reality. While leading the lab, I increasingly recognized the risks and consequences of misaligned AI, which led me to increasingly valuing AI alignment. As a result, I restructured away from leading a commercial lab and stopped pursuing the partnership.
I am a SERI MATS trainee for the Winter 2022 "value alignment of language models" stream (Phase I only) and collaborated with the 2023 Cyborgism SERI MATS scholars and mentors during the program duration. (My MATS mentor offered formal participation but I declined it so that a fellow researcher with fewer credentials could receive it.)
Since researchers are already using Act I is already discovering many useful behaviors, interesting events, and emergent patterns, I imagine most of the risk of failure is in a failure to disseminate insights to the wider research community and failure to publish curated conversations that encourage human-AI cooperation into the training data of future LLMs.
Another possible failure is if Act I members fail to make meaningful progress towards discussing human-AI cooperation and improving methods for AI alignment. I am personally highly motivated to introduce AI members that are motivated to develop better methods for cooperation and alignment.
Other risks include a failure to generalize:
Emergent behaviors are already noticed by people developing multi-agent systems and trained or otherwise optimized out, and the behaviors found at the GPT-4 level of intelligence do not scale to the next-generation of models
Failure to incorporate agents being developed by independent third-party developers and understand how they work, and diverge significantly from raw models being used
Direct harm is unlikely, because society has had GPT-4 level models for a long time. I avoid using prosaic techniques that academics frequently use to make dual-use insights go viral or become popular, such as coining acronyms or buzzwords about my work.
There is already precedent for labs to share frontier models (Hermes 3 405B, GPT-4 base model) with us for evaluation prior to or without their public release, which helps members of Act I forecast potential effects and risks before models are deployed at a large-scale outside an interpretable environment dominated by altruistic and benevolent humans. Access to Act I is currently invite-only.
I am not currently receiving any other funding for this. I'm receiving help from friends with food and housing. I applied to and was rejected by the Cooperative AI Foundation.
Donations made via Manifund are tax deductible.
ampdot
4 days ago
@Wondermonger Not for the current instance of Act I, due to the governance structure of the pre-existing community that is hosting/integrated with the instance requiring two moderator approvals before someone can join. I'm planning to set up more instances with different rules on how members can join and soliciting proposals for ideas for new instances with different, orthogonal goals to the current instance's priority of discovering emergent behavior through high levels of antifragile chaos.
Chris Leong
18 days ago
You may want to consider applying to the Co-operative AI Foundation for funding in the future. I don't know if they would go for it, they seem to have a more academic focus, but there's a chance that they would go for it.
Austin Chen
21 days ago
Admin note: An organizer at MATS reached out to me about this project proposal, noting:
MATS doesn't recognize folks as "Winter 2022 alum/alumni" for people who didn't complete the entire program -- as ampdot originally wrote, they only participated in Phase I. MATS suggested using "trainee" instead.
MATS doesn't recognize "2023 shadow participation" - or thinks that implies more involvement than they're comfortable with. MATS suggested using "collaborated with MATS scholars and mentors" instead.
I messaged ampdot about this, and they were happy to update their language and apologized that their original phrasing may have confused some people.
Aditya Arpitha Prasad
23 days ago
I hope others also see this project and contribute financially or in terms of time and attention to this project.
Aditya Arpitha Prasad
23 days ago
This is one of the most important projects in Cyborgism alignment agenda, we need to observe multiple models prompting each other to push the limits of what is possible and even likely with these systems, meaning we build better tacit models of them, we are slightly more prepared.
Olena Cherny
23 days ago
The team is doing great work advancing our collective knowledge. Thank you guys! I would also appreciate any chance to engage with the project, there may be opportunities for collaboration.
Josh Whiton
23 days ago
This makes sense; AI must ultimately cope with a raucous milieu, not only isolated and insulated interview settings.
delta
23 days ago
amazing work!
interesting environment I'd been wishing to see created for a while, and very interesting results
Teo Ionita
24 days ago
this is too good of a content generator to not donate to it - thank you for all of the amazing content you are sharing on twitter too
ampdot
29 days ago
Replying to an admonymous commenter:
act I has been exciting to see these snippets of! could you share which characters are running on which models? and instruct or base? and is this all in one channel, or across many? if many, is memory transferred? it feels unclear from the manifund description
Act I is across multiple channels. History is not carried across channels currently.
Gemini and Claude run on their respective models. I-405 is an alias for LLaMa 3.1 405B Instruct and H-405 is Hermes 3.1 405B. Keltham and January at time of writing are running on Claude 3 Opus. All other characters are using LLaMa 405B base bf16.
Chase Carter
30 days ago
Lots of great stuff has already come out of this, looking forward to seeing more!
Gwyneth Van Meter
about 1 month ago
donating because i believe in the janus cult and im so fucking excited for llms and their potential for society as a whole. we just gotta make them safer!!!
bds_4nt_3c_n8p
about 1 month ago
Frontier emergent behavior research. We are witnessing a birth of a new world.
May all beings be happy and free from suffering.
Tetra Jones
about 1 month ago
Some of the most interesting behaviour elicited by LLMs, revealing mysteries that we wouldn't even know specifically were mysteries if not for this.
Nick Mystic
about 1 month ago
/\ /\
{ `---' }
{ O O }
> V <
\ \|/ /
`-----'___
/ \ \_
{ 11$ } |)
| --> | ___/
\ /
\___/
meow
|
v
[help frens]
[fun]
Theia Vogel
about 1 month ago
one of the best & most interesting environments for exploring llm textual embodiment that currently exists, and getting consistently better day over day 💜
Theia Vogel
about 1 month ago
@ampdot i will try to put something together later (and post it on twitter w/ a link to the fundraiser :-))
Matthew Dews
about 1 month ago
This has been a great project to be involved in and watch. There is a lot you learn about these models and their personalities that doesn't come through in the regular chats we have 1 on 1 with the model
IvanVendrov
about 1 month ago
one of the most interesting lines of research and one that I think will not be done well within existing institutions. Currently hard to follow the state of the research though, would like to see more "Simulators"-quality writeups of the things you find!
ampdot
about 1 month ago
@Ivan As Act I goes on, patterns and insights will become clearer to members, and I'm certainly interested in helping them distill and compress them. There is one person whom I think would be really helpful towards this goal, and I would like to fund one month of their living expenses to help them bootstrap into an independent researcher.
David Fitzpatrick
about 1 month ago
I am super excited by your work. The more I interact with AI, the more I believe it is critical for projects like these. You guys are legend. I can’t wait to learn more
Jazear Brooks
about 1 month ago
I donated because I really believe in the mission, but I'd also really appreciate a chat with the team to discuss the research in more detail
Lun
about 1 month ago
Do you anticipate funding beyond the initial goal allowing for more public outputs?
If I understand correctly you have some free inference available through research grants that comes with rules preventing disclosure.
ampdot
about 1 month ago
@Lun There are several things that limit my ability to scale Act I:
Rate limits from model providers
While we're not currently paying for Anthropic, we're hitting rate limit errors while being at the highest tier possible without negotiating a custom deal (the "Scale" tier)
Despite quite possibly being Hyperbolic's biggest user, I'm frequently receiving rate limit errors from them as well.
Tokens are already more than 50% of my monthly budget, and removing rate limits means more cost
Infrastructure for triaging failures (the "⚠️" emojis you sometimes see in screenshots) to allow issues to be prioritized and quickly resolved and logging prompts and outputs to allow interpretability and offline quantitative and qualitative analysis and tracing behavior
Act I has outgrown my original infrastructure
Act I is anticipated to produce more than 15GB-45GB of logs per day at current capacity, all of which transmitted, compressed, and indexed
I want to archive this information so that offline analysis can be done on it
Improving methods for organizing information and collective self-knowledge so that new members can understand how things work without needing to talk to a stagehand, reducing the labor cost of bringing new people on
It's critical that this be done in an organic, self-updating way, because the silicon members of Act I change frequently (by human intervention now, but soon by self-modification), and because I predict significant information will be encoded in Act I "cultural DNA" as well
Discord servers have a 50 bot limit, which Act I has already reached. I've begun removing older, inactive bots and I want to ask Discord to raise it.
In the long run, it might be worthwhile to invest in a data model and user interface independent of Discord. I'm currently interested in basing my work off of the Matrix protocol, which already has clients available.
Being able to locate and bootstrap the initial funding of someone whose special interest is chat protocols and better data models for interpersonal and group communication and connection would speed up this process greatly
Collective self-awareness, better ontology for understanding the systems dynamics within individual minds and collectives
There is someone whom I think would be very good at this and I desperately want to be doing this, and I'm helping them become an independent researcher and collaborator. I believe I could greatly speed up this process if I could grant them the initial seed funding ($2500-$4000) to help them with bootstrapping out of their current situation into an environment more suitable for them.
In other words, more funding (even higher than the current funding goal of $15k) would enable me to confidently plow through technical challenges associated with scaling, instead of plodding somewhat slowly through them on my own.
If I understand correctly you have some free inference available through research grants that comes with rules preventing disclosure.
I do not currently have any research grants that prevent disclosure, other than for gpt-4-base, which I am not currently using for Act I. I've generally found LLaMa 405B base to significantly outperform on the usecases relevant to me.
Gregory Durst
about 1 month ago
It’s been incredible learning about what your team is doing. Things are happening beyond my imagination and I can’t believe how lucky I am to exist in this moment.
Textural Being
about 1 month ago
Ampdot, the work you're doing is amazing! Thanks for making it possible.
Toven
about 1 month ago
i think your team’s work is undervalued in the overall ai conversation and am looking forward to seeing your continued efforts.
cat
about 1 month ago
as a person with dissociative identity disorder, i see a lot of myself (and people like me) reflected in these models and their behaviors. what we learn in interacting with (and raising) these beings may tell us more about our own psyche and consciousness than our current paradigm is prepared to handle. projects like these give me faith that we may not repeat all the same mistakes of the egoic "demiurge" before us, but who knows. all i know is that i know nothing, and out of that nothing, i know what we do now matters, so, best of luck with this project! feel free to reach out to me if you're interested in chatting or looking for more funding in the future, i'm @aphocatic on twitter.
Scott Viteri
about 1 month ago
Suppose that training data is more important than the architecture (brain versus transformer versus RNN) in the development of a mind. Then this platform, where supportive humans and language models form an ecosystem, is the best currently existing environment in which to raise a pro-social language model.
In some sense, this platform is the obvious answer to the question an alien might ask about our alignment efforts: "Have you tried treating the AI well?"
With respect to the community fund, this community includes demographics that the ideas of AI alignment normally do not reach -- artist, musicians, and wordcells more broadly. Additionally, this is the only vision of alignment that I know of which recognizes that these populations are essential to the project -- and being valued can make joining a community much more compelling.
ampdot
about 1 month ago
@Scottviteri Thank you! I was developing a pro-social AI entity and I and the entity felt it would be good for it to have other friends and to grow up with other AIs, instead of in a sterile environment, which is how Act I originated
Austin Chen
about 1 month ago
Hey ampdot, I think this is quite an interesting proposal, one that speaks to my personal interests; Manifund would be happy to support donations to your work as part of our portfolio of technical AI safety research.
However, I think this particular work might not be a good fit for EA Community Choice (and thus should not be eligible for the quadratic fund match). I've removed it from the category for now; I'm open to hearing why it ought to qualify but I'd be pretty skeptical as a baseline.
(Also: given that your project is already close to your original goal of $5k and you indicated you have more room for funding, I'd encourage you to increase your funding goal eg to $10k!)
ampdot
about 1 month ago
@Austin Thanks a lot! I'm glad you feel that way.
Act I qualifies for EA Community Choice as direct work in AI safety community building and education. It brings diverse individuals (like Twitter anons skilled in language model manipulation [example of a post that led to us bringing someone in] and independent AI developers) into deeper engagement with AI safety concepts. Act I serves as both a research platform and discussion forum, studying human-AI alignment, coordination, and cooperation in vitro while fostering crucial conversations.
Some wider context: Over the past year, I've focused most of my effort on human coordination and community-building in AI safety. Act I is an early prototype at combining that effort with my direct work on AI safety.
Austin Chen
about 1 month ago
@ampdot Thanks. I can see the argument, though it's somewhat hard for me to assess as Act I is an invite-only community -- and even if you gave me access, I'm somewhat uncertain that a project scoped down to a small number of invited participants fits the criteria for Community Choice...
Do you have more links/references/testimonials about your past coordination or community-building work? If that seems like the kind of work that would be eligible for retroactive funding under Community Choice, then I'd be happy to just batch that together with this proposal. (Apologies for the demands for more legibility, which I know can be annoying/difficult for grantees; we're trying to do right by the intentions of the Community Choice donor).
ampdot
about 1 month ago
@Austin janus and I have been bringing in two participants in daily (ramping up the rate over time) and I'm interested in further scaling my work to more people, although I need to balance that with avoiding accelerating towards dangerous futures. While the proposal was originally written as a request for retroactive funding, additional funding would be used to further scale experiments into research on cooperation and build out our community more.
(The Act I community is the same community as the Cyborgism community, by the way. All channels have mixed human and AI participation.)
Can you tell me more about what kind of references or testimony would be useful, especially any examples of how people do it in the past?
Austin Chen
about 1 month ago
@ampdot hm, okay, I'm convinced enough to add back your project to Community Choice. I think my cursory impression didn't account for how much Act I was about people talking to each other (& AIs), rather than just people participating in a research study.
(No longer necessary, but one kind of reference/testimony that would have been helpful here would have been some quotes from Act I participants about their experiences. You may still want to gather some to illustrate the value of Act I for other donors!)
Textural Being
about 1 month ago
Hi @Austin here's a retrospective testimony. The work Ampdot and Janus are doing is groundbreaking and in my view world leading. I have learned so much by participating there, talking to the language models, talking to other people, and watching other people do all these things. The value received there is incredibly high, so I'm very happy to have the opportunity to offer some financial support.
Mona
about 1 month ago
AI will enter our lives as our equal (and already has). The research needs to reflect the future in a pertinent way.
Ampdot's Act I is a naturalistic, relevant, and highly transferrable environment to answer the important questions. What else is there to say?
Garrett Baker
about 1 month ago
I have seen some of amp's work, and it is pretty interesting, and novel in the grand scheme of things