Project Pidgen: A Post-Transformer Foundation Model Architecture

Project summary

Auxerta has built Project Pidgen, a post-transformer foundation model architecture. The model is a SSM hybrid trained with a dual-objective loss (world-model + cross-entropy) in which language operates as the substrate the model thinks in, not as the output layer. The SSM core provides O(1) hidden-state memory with sub-linear VRAM scaling across sequence length. Two flagship models have been trained: PigeonWorld (7.31B parameters, 12B tokens) and PidgenV5 (9.34B parameters, 8B tokens). The project is supported by NVIDIA Innovation Lab (DGX Cloud cohort) and Lambda Cloud (compute credits).

What are this project's goals? How will you achieve them?

The first goal is complete: a working post-transformer foundation model with validated architecture. The second goal is to fully train PidgenV5 to 1T tokens and benchmark architectural behavior at scale. The third goal is to test the world-model objective at higher training densities (100B+ tokens) to determine whether the sample-efficiency advantage observed at low T/P holds, breaks, or strengthens at production scale.

How will this funding be used?

Compute for completing PidgenV5 training. Contract research engineers for ablation work, architectural variants, and reproducibility study. Operational runway and buffer for unforeseen technical issues during scaled training.

Who is on your team? What's your track record on similar projects?

Two co-founders. Philip Abao: 10 years of independent software engineering in the Bay Area; built the architecture from scratch and trained both flagship models. Soraya Johnson: 5 years researching machine learning behavior and data training.

What are the most likely causes and outcomes if this project fails?

The principal risk is that the sample-efficiency signal observed at our current training density (T/P = 1.6) does not hold at 100B-1T tokens. Possible failure modes include loss stagnation, mode collapse, gradient instability, and emergent capability plateau. We have encountered and resolved smaller-scale training instabilities to date and have mitigations in place; however, the behavior of this architecture at production-scale training densities is not yet known. That uncertainty is what the funding is meant to resolve.

How much money have you raised in the last 12 months, and from where?

Self-funded by founders. Compute and program support from NVIDIA Innovation Lab and Lambda Cloud. No equity capital raised to date.

Video demo is availble here with:
https://drive.google.com/file/d/1GmnmUk8qH1P-Eh-dBERQstt6MJG-KZrF/view?usp=sharing
with raw benchmark availble upon request.