Alexander Bistagne

@alexhb61

Computational Complexity & AI Alignment Independent Researcher

https://github.com/Alexhb61

$0total balance

$0charity balance

$0cash balance

$0 in pending offers

About Me

I was introduced to concerns about AGI alignment via Robert Miles work back in college around 2018, and It motivated me to take Lise Getoor's Algorithms and Ethics class at UC Santa Cruz.

Now, I'm an independent researcher whose working on AI Alignment among other things. My current approach to AI Alignment is to use computational complexity techniques on black boxes. I'm of the opinion that post construction aligning black box AI's is infeasible.

Projects

Alignment Is Hard

Comments

Alignment Is Hard

Alexander Bistagne

11 months ago

$6070 salary.

sorry.

Alignment Is Hard

Alexander Bistagne

11 months ago

Final report

The original Plan was to type something formally up and get it into a conference. I succeeded at the first part and failed at the second (see other updates).

In total, while I think I was able to say something formally, my results were not especially clear. Without an environment conducive to AI safety research, I can not in good faith ask for any more money; thus, I am closing this project for the for-seeable future, and thus this grant is done.

Spending breakdown

6000$ salary

0$ taxes

0$ conference

Alignment Is Hard

Alexander Bistagne

over 1 year ago

Progress update

What progress have you made since your last update?

The paper was submitted to and rejected from the Alignment Forum. After reading it with a friend, I noticed serious sequencing issues and unnecessary definitions. I decided I needed a break.

I have since found people willing to give feedback on future drafts, and have joined the Ronin Institute where I might also receive feedback.

What are your next steps?

I intend to write a less verbose draft with more examples.
This draft will be posted to my github.
I plan on going through at least 3 rounds of comment, review and editing, before giving a lightning talk at the Ronin Institute where I might get more feedback. After another drafting phase, I will submit another post to the Alignment Forum. I will need to acquire some mentorship or a co-author before submitting for peer-review after that. I will consider the project over after peer-review or thorough refutation.

Is there anything others could help you with?

Without more funding, I can only reliably commit to 10 hours a week on this project.
This leg of the project is aiming to have more examples.
I am looking more feedback.
I am looking for a co-author or mentor to help with formalization before peer-review. Contact me via Email if interested or have ideas.
If others are interested in giving private examples or feedback, contact me on discord or email me. Public examples or feedback can be made through Github issues.

Alignment Is Hard

Alexander Bistagne

about 2 years ago

Progress update

Post available on lesswrong and submitted to alignment forum.

https://www.lesswrong.com/posts/JxhJfqfTJB9dkq72K/alignment-is-hard-an-uncomputable-alignment-problem-1

Alignment Is Hard

Alexander Bistagne

about 2 years ago

Project is on github. https://github.com/Alexhb61/Alignment/blob/main/Draft_2.pdf

citations and submitting to Alignment forum tommorrow.

Alignment Is Hard

Alexander Bistagne

about 2 years ago

This project is nearly at its target, but hit a delay near the beginning of september as I needed to take up other work to pay bills. Hopefully, I will post the minimal paper soon.

Alignment Is Hard

Alexander Bistagne

over 2 years ago

Conditional on 6k being reached,

I have committed to submitting an edited draft to the alignment forum on August 23rd

Alignment Is Hard

Alexander Bistagne

over 2 years ago

Correction Co-RE is the class not Co-R. The set of problems reducable to the complement of the halting problen

Alignment Is Hard

Alexander Bistagne

over 2 years ago

Technical detail worth mentioning; Here is the main theorem of the 6K project:

Proving an immutable code agent with turing-complete architecure in a turing machine simulateable environment has nontrivial betrayal-sensitive alignment is CoR-Hard.

The paper would define nontrivial betrayal-sensitive alignment and some constructions on agents needed in the proof.

Alignment Is Hard

Alexander Bistagne

over 2 years ago

Thanks for the encouragement and donation.

The 40K max would be a much larger project than the 6K project which is what I summarized.

6K would cover editing

-Argument refuting testing anti-betrayal alignments in turing complete architecture

-Argument connecting testing alignment to training alignment in single agent architecture

40k would additionally cover developing and editing

-Arguments around anti-betrayal alignments in deterministic or randomized, P or PSPACE complete architecture

-Arguments around short term anti-betrayal alignments

-Arguments connecting do-no-harm alignments to short term antibetrayal alignments

-Arguments refuting general solutions to the stop button problem which transform the utility function in computable reals context

-Arguments around general solutions to the stop button problem with floating point utility functions

-Foundations for modelling mutable agents or subagents

Transactions

For	Date	Type	Amount
Manifund Bank	about 2 years ago	withdraw	70
Alignment Is Hard	about 2 years ago	project donation	+70
Manifund Bank	over 2 years ago	withdraw	6000
Alignment Is Hard	over 2 years ago	project donation	+1200
Alignment Is Hard	over 2 years ago	project donation	+1000
Alignment Is Hard	over 2 years ago	project donation	+3800