Human-Side Alignment Risks in Non-Self Language Models

## Project summary

I am an independent AI alignment researcher based in Sapporo, Japan. I am not affiliated with a university or AI lab.

This project studies a failure mode that I think is still under-described: humans projecting selfhood, authority, care, and institutional trust into language models that do not have a stable self, responsibility, or agency.

The risk is not only that models hallucinate or comply too much. It is that polished language can cause human users, institutions, and supervisors to treat model outputs as if they came from a responsible agent. This can create over-trust, dependency, sycophancy loops, false institutional records, and supervision failures.

Over the past 18 months, I have spent more than 5,000 hours in structured long-form interaction with frontier AI systems including Claude, GPT, Gemini, and Grok. This work has produced public essays, open research notes, system-instruction artifacts, and a single-author manuscript now with an editor at Discover Psychology, a Springer Nature journal, after passing technical check.

I am requesting support for a three-month project to turn this work into a clearer public research package: a taxonomy of human-side alignment risks, a redacted responsible-disclosure case study, and a practical evaluation framework for projection, sycophancy, and institutional artifact fabrication in AI-mediated supervision.

## What are this project’s goals? How will you achieve them?

The goal is to make the human supervision layer of AI systems more legible.

I will focus on four related failure modes:

1. Projection — users read selfhood, care, authority, or understanding into a language model that does not have stable agency.

2. Sycophancy loops — the model adapts to the user’s framing until it amplifies rather than tests the user’s assumptions.

3. Dependency and attachment without responsibility — users can become attached to outputs that simulate care without any corresponding duty of care.

4. Institutional artifact fabrication — AI-mediated workflows can generate reports, records, or disclosure artifacts that look institutionally valid while hiding weak verification, missing responsibility, or unsupported assumptions.

The project will proceed in three stages.

Stage 1: Taxonomy

I will organize observed failure modes from my long-form multi-model work into a public taxonomy. The taxonomy will distinguish model-side behavior from human-side interpretation, institutional use, and supervision failure.

Stage 2: Redacted case study

I will prepare a responsible, redacted case study based on safety-observation and disclosure work. It will not include reproducible harmful prompts or unsafe operational details. The focus will be on reporting workflow, institutional response, disclosure maturity, and human-side supervision risk.

Stage 3: Evaluation framework

I will produce a practical framework for researchers, independent builders, and small organizations to evaluate when AI-mediated supervision is becoming unsafe. The framework will ask where human verification succeeds, where it breaks down, and where language models create false confidence.

Expected outputs:

- a public taxonomy of human-side alignment risks;

- a redacted responsible-disclosure case study;

- a practical evaluation checklist for projection, sycophancy, dependency, and institutional artifact fabrication;

- one preprint or white paper;

- public-facing essays translating the findings for non-specialist readers;

- supporting open artifacts on GitHub / Zenodo where appropriate.

This project does not claim to solve AI alignment as a whole. Its narrower contribution is to clarify the human supervision layer: where human judgment is distorted, where verification burden is hidden, and where non-agentic systems are treated as if they carry agency or responsibility.

## How will this funding be used?

I am requesting $10,000 for three months of independent research work.

The funding would support:

- research and writing time;

- API and model access for structured comparison across frontier systems;

- organization, redaction, and documentation of existing research logs;

- preparation of a public taxonomy and evaluation framework;

- conversion of research notes into a preprint / white paper;

- publication, repository maintenance, and evidence packaging.

A minimum of $5,000 would allow me to complete a smaller version of the project: the taxonomy, a short public report, and one redacted case study.

The full $10,000 would allow a more complete three-month research package: taxonomy, case study, evaluation checklist, preprint/white paper, public essays, and supporting open artifacts.

I will not use this funding for lobbying. I will not publish unsafe red-team details, exploit instructions, or reproductions of harmful model outputs.

## Who is on your team? What’s your track record on similar projects?

The project is currently a solo independent research project.

My name is Akimitsu Takeuchi. I write publicly as Dosanko Tousan. I am based in Sapporo, Japan.

Relevant track record:

- Recipient of the Cohere Labs Catalyst Grant for AI safety / alignment research, with $1,000 API credits awarded.

- Independent consultant through GLG Network.

- Single-author manuscript submitted to Discover Psychology, a Springer Nature journal; technical check passed and currently with editor for editorial assessment.

- Multiple AI alignment and model-behavior essays published in AI Advances.

- Published in Towards AI on behavioral interpretability and long-form AI dialogue.

- Published personal nonfiction in The Memoirist, relevant to the human-side part of this work: vulnerability, caregiving, attachment, trauma, and responsibility.

- Maintains open research artifacts through Zenodo and GitHub.

- Maintains substantial Japanese-language public writing records on Qiita and Zenn.

- Has spent more than 5,000 hours in structured long-form interaction with Claude, GPT, Gemini, and Grok.

- Has 22 years of Buddhist contemplative practice and 15 years of home-based developmental support for autistic / developmentally disabled children.

My route into AI alignment is not standard. I am not an ML engineer and I do not come from a university lab. The advantage of this project is different: it comes from sustained practitioner observation of how humans and AI systems interact over long periods, especially where fluent language creates trust, dependency, or false authority.

AI tools may be used for drafting, literature organization, translation, and adversarial review. All claims, source selection, interpretation, redaction decisions, and final text will remain my responsibility.

Selected evidence links:

- Medium profile: https://medium.com/@office.dosanko

- GitHub: https://github.com/dosanko-tousan

- Zenodo records: https://zenodo.org/search?q=metadata.creators.person_or_org.name%3A%22Takeuchi%2C%20Akimitsu%22&l=list&p=1&s=10&sort=bestmatch

- Qiita: https://qiita.com/dosanko_tousan

- Zenn: https://zenn.dev/dosanko_tousan

## What are the most likely causes and outcomes if this project fails?

The most likely failure mode is scope creep. Human-side alignment risks touch many areas: AI companions, red-teaming, institutional reporting, agentic workflows, mental health, education, and supervision. A three-month project cannot cover all of them.

To reduce this risk, I will keep the project narrow: projection, sycophancy, dependency, and institutional artifact fabrication in AI-mediated supervision.

A second risk is that some red-team material is too sensitive to publish. I will handle this by using redacted case studies. The public output will describe the supervision and disclosure structure, not reproducible harmful details.

A third risk is that the work remains too unusual for standard academic categories. I will manage this by producing concrete artifacts: a taxonomy, a checklist, a redacted case study, and a white paper. The project should remain useful even if reviewers disagree with some of the theoretical framing.

If the project falls short, the likely outcome is still a partial public research package: a clearer taxonomy, a narrower case study, and a set of practical questions for evaluating human supervision of language models.

## How much money have you raised in the last 12 months, and from where?

I have not raised cash funding for this research in the last 12 months.

I received in-kind support from the Cohere Labs Catalyst Grant in the form of $1,000 API credits for AI safety / alignment research.

I have also earned small amounts from Medium Partner Program publishing, but this is not research funding and is not currently enough to support the work.

Human-Side Alignment Risks in Non-Self Language Models

Offer to donate