alexhb61 avatar
Alexander Bistagne

@alexhb61

Computational Complexity & AI Alignment Independent Researcher

https://github.com/Alexhb61
$0total balance
$0charity balance
$0cash balance

$0 in pending offers

About Me

I was introduced to concerns about AGI alignment via Robert Miles work back in college around 2018, and It motivated me to take Lise Getoor's Algorithms and Ethics class at UC Santa Cruz.

Now, I'm an independent researcher whose working on AI Alignment among other things. My current approach to AI Alignment is to use computational complexity techniques on black boxes. I'm of the opinion that post construction aligning black box AI's is infeasible.

Projects

Comments

alexhb61 avatar

Alexander Bistagne

7 months ago

Progress update

What progress have you made since your last update?

The paper was submitted to and rejected from the Alignment Forum. After reading it with a friend, I noticed serious sequencing issues and unnecessary definitions. I decided I needed a break.

I have since found people willing to give feedback on future drafts, and have joined the Ronin Institute where I might also receive feedback.

What are your next steps?

I intend to write a less verbose draft with more examples.
This draft will be posted to my github.
I plan on going through at least 3 rounds of comment, review and editing, before giving a lightning talk at the Ronin Institute where I might get more feedback. After another drafting phase, I will submit another post to the Alignment Forum. I will need to acquire some mentorship or a co-author before submitting for peer-review after that. I will consider the project over after peer-review or thorough refutation.

Is there anything others could help you with?

  1. Without more funding, I can only reliably commit to 10 hours a week on this project.

  2. This leg of the project is aiming to have more examples.

  3. I am looking more feedback.

  4. I am looking for a co-author or mentor to help with formalization before peer-review. Contact me via Email if interested or have ideas.

    If others are interested in giving private examples or feedback, contact me on discord or email me. Public examples or feedback can be made through Github issues.

alexhb61 avatar

Alexander Bistagne

about 1 year ago

Progress update

Post available on lesswrong and submitted to alignment forum.

https://www.lesswrong.com/posts/JxhJfqfTJB9dkq72K/alignment-is-hard-an-uncomputable-alignment-problem-1

alexhb61 avatar

Alexander Bistagne

about 1 year ago

Project is on github. https://github.com/Alexhb61/Alignment/blob/main/Draft_2.pdf

citations and submitting to Alignment forum tommorrow.

alexhb61 avatar

Alexander Bistagne

about 1 year ago

This project is nearly at its target, but hit a delay near the beginning of september as I needed to take up other work to pay bills. Hopefully, I will post the minimal paper soon.

alexhb61 avatar

Alexander Bistagne

over 1 year ago

@alexhb61

Conditional on 6k being reached,

I have committed to submitting an edited draft to the alignment forum on August 23rd

alexhb61 avatar

Alexander Bistagne

over 1 year ago

Correction Co-RE is the class not Co-R. The set of problems reducable to the complement of the halting problen

alexhb61 avatar

Alexander Bistagne

over 1 year ago

Technical detail worth mentioning; Here is the main theorem of the 6K project:

Proving an immutable code agent with turing-complete architecure in a turing machine simulateable environment has nontrivial betrayal-sensitive alignment is CoR-Hard.

The paper would define nontrivial betrayal-sensitive alignment and some constructions on agents needed in the proof.

alexhb61 avatar

Alexander Bistagne

over 1 year ago

Thanks for the encouragement and donation.

The 40K max would be a much larger project than the 6K project which is what I summarized.

6K would cover editing

-Argument refuting testing anti-betrayal alignments in turing complete architecture

-Argument connecting testing alignment to training alignment in single agent architecture

40k would additionally cover developing and editing

-Arguments around anti-betrayal alignments in deterministic or randomized, P or PSPACE complete architecture

-Arguments around short term anti-betrayal alignments

-Arguments connecting do-no-harm alignments to short term antibetrayal alignments

-Arguments refuting general solutions to the stop button problem which transform the utility function in computable reals context

-Arguments around general solutions to the stop button problem with floating point utility functions

-Foundations for modelling mutable agents or subagents

Transactions

ForDateTypeAmount
Manifund Bankabout 1 year agowithdraw70
Alignment Is Hardover 1 year agoproject donation+70
Manifund Bankover 1 year agowithdraw6000
Alignment Is Hardover 1 year agoproject donation+1200
Alignment Is Hardover 1 year agoproject donation+1000
Alignment Is Hardover 1 year agoproject donation+3800