What progress have you made since your last update?
Here's a corpus of 5000 novels starring a friendly helpful harmless AI as friend and confidante: https://huggingface.co/datasets/dickbutkis/hyperstition/blob/main/Hyperstition%20Corpus%20v1.zip
What are your next steps?
We're aiming to create the proposed TurnTrout/Cloud dataset, running that downstream experiment. The vision is to evade existing misalignment narratives by generating stories about a "❖", a novel type of synthetic angel being, fine-tune a model on that corpus, and prepend all future prompts with "you are a ❖" to see if we have less misbehavior.
Is there anything others could help you with?
This next dataset will cost more compute and any gifts of compute would be welcome.