You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
Building a new type of language model based on discrete rules (Prolog-like) as opposed to continuous neurons.
First, improving language model interpretability.
It can be a significant challenge to understand what exactly is going in a modern Transformer after token embedding. The embedding travels along the residual stream for many layers, and with each layer the stream is transformed based on the all previous token's previous stream at that layer. The transformations and weights are often 16 bit values or more.
Discrete rules may be uniquely interpretable.
Also, improved performance, training efficiency, and continual learning capabilities.
Many intelligent processes can be broken down into discrete steps. For example: when you see the characters " c a t ", you can recognize the word "cat". This requires no continuous process. Important tasks such as arithmetic and traversing a knowledge graph are also discrete, and current LLMs often fail to learn robust and generalizable solutions for them.
Discrete rules may be more performant and better for representation learning in language modeling.
Goals are to be accomplished by thorough study and analysis of discrete language models.
Funding will be used to support myself and for compute.
Just me. I did research at university, applying machine learning to improve classification of electrical signal outputs from medical sensors. I was initial member at a data observability startup, researching core ML algorithms behind the platform (went on to be YC funded).
Transformers could be more cost efficient, given current accelerator architecture optimization for dense floating point matrix operations. If the discrete language model shows benefits, GPU and ASIC codesign could change the economics down the road.
None.