More Computationally Efficent Alternative to LLMs

LLMs are autoregressive so their terrain of possibilities is shaped in advance based on the statistical structure of the language corpus.

So, what is the model’s implicit world model or its prior over language?

No other rules are given to this LLM when it’s mapping input to output space. It just is asked to take in input to produce a particular output after some type of transformation in this so called “black-box”. The rest is just learning and tuning the parameters. The token representation is not decomposable, meaning it doesn’t point to anything outside of themselves. So when we think about the world view of the model, it is in fact really limited. This means LLMs don’t understand language in the conversational sense. It fails when it comes to spatial data because it cannot intuitively understand the laws of physics like we do.

When I was deeply researching the AI-driven mineral exploration space, I dove deep into physics-based AI and discovered how limited LLMs were when it came to understanding physical laws because of all the points mentioned above.

What's always captivated me is finding underlying structure in complex systems—game strategy, prediction markets, poker, politics. I'm drawn to anything dynamic, stochastic, and noisy, where patterns hide beneath chaos.

At an AGI House hackathon, I watched the largest virtual cell ever trained, and it struck me: if we can model biological systems at that scale, why aren’t we modeling complex systems — one of the highest-value spatiotemporal systems in existence — with the same rigor and dimensionality?

In physics, the lowest level structure is a hypergraph which represents the metamathematical structure of space. We could deduce the structure of space (knitted together by casual connections represented by casual graphs) by taking a particular time slice from the dynamic evolution which can be mapped out by looking at the overlaps of the light cones. The idea of accumulative evolution is that two axioms are connected in the instantaneous state of space if there’s a causal connection between them defining within the slice of the causal graph that occurs within the time slice we are considering. Entailment cones overlaps is knitted into entailment fabric which is the entangled result of running all possible rules. By applying rules to rules, and running many computations (through progressive time evolution of an entailment graph) causal graph build up over time to define the content of an entailment cone.

But is there a way to shortcut the program and come up with a formula to predict the value faster than running the full algorithm.

Some systems simply cannot be shortcut or predicted in advance without a formula or a calculation, which is computational irreducibility.

Even simple systems can be too messy, too complex, or too chaotic.

Chaotic systems are hard to predict because they are extremely sensitive to tiny changes that can lead to extremely different behaviors so as the paths evolve, they can diverge enormously; since one cell makes a different to all the other cells nearby, eventually causing a butterfly effect that could impact all the ones far away.

Butterfly flaps its win in Chicago causing a tornado in Oklahoma meaning a tiny wrong detail will cause a completely wrong prediction especially the further out you predict so currently every long-term forecasting breaks.

Mathematician can recognize pattern in the data. In order to find patterns, for years, mathematicians use trials and errors to refine relationships in the data to find the topological order of the data. But machine learning can help with this process of refining and testing new patterns, which used to relied heavily on human ability to create. For instance, symbolic regression is searching for symbolic equations that match the data. Because MLP are universal approximation theorem for all the reasons mentioned above, it is hard to extract an explicit formulas and symbols from them. Instead of learning weights between nodes, KANs learn actual function at each connection in the network – which is much more interpretable – because you can actually see and analyze the underlying mathematical structure inside this network. While its easy to predict the trajectory of cannon ball when you are observing a system and coming up with equations to describe it, it falls apart when you are trying to predict the weather, since it’s too chaotic and computationally irreducible.

KANs are a recent discovery, and has only been scaled to 17B parameters, but shows promising direction in being able to accelerate the field of quantum mechanics & other scientific field through finding formulas within the data. Currently, Data Center electricity, consumption, driven by AI, which is just under 3% of total global electricity consumption, is growing at roughly 12% to 15% per year, four times faster than total global electricity demand. Fastest-Growing-Frontier AI labs spend $1B-$7B+ (1.2-1.5B anecdotal data from my AI Campus) per campus: 300-1GW+. Compute takes up about 53 percent of the costs for a typical Frontier AI Lab, demonstrating how training data is massively draining budgets.

KANs or other more computationally efficient alternatives to LLMs would massively stop hurting the environment since we are running out of compute, and hurting the earth while doing so. While I am very concerned about AI safety topics such as making sure the elderly population are prevented from AI-targeted scams, and vulnerable populations are not exploited, I think the biggest problem we have to tackle is the fact that our energy infrastructure cannot keep pace with the scale of AI. For the past two years, I have been self-funding my research, and can no longer continue to do so, and have to go back to finance. This funding would enable me to continue my research into physics-based AI such as KANs or energy-based models that would be more computationally efficient, interpretable and explainable than current method.

More Computationally Efficent Alternative to LLMs

Offer to donate