Making sure that the code AI pushes to the internet is safe and bug-free

what do you want to do? be specific.
Big picture: I want to make sure that the code AI pushes to the Internet is safe and bug-free. As a step toward that goal, I want to push further on a concept my partner and I at Midspiral have been working on. We are building LemmaScript, a verification toolchain for TypeScript. We point an agent to the toolchain and some supporting tools, and it writes verifiably correct TypeScript with an entire class of bugs eliminated at compile time. We have over 20 case studies in TypeScript projects, both greenfield and brownfield. We want to continue development of LemmaScript, and the ecosystem around it, so that it becomes seamless to verify any TypeScript project (some projects might still require well-integrated refactoring). Ideally, we would like to build a robust enough ecosystem for LemmaScript so an agent can reliably generate the specification annotations, programs, and iteratively write proofs both quickly and more cost-efficiently than it can generate the state-of-the-art (sometimes buggy) TypeScript programs. In our experience so far, verification guides development more effectively than if verification is not involved: definitely towards a better solution, and sometimes converging faster.
how are you planning to spend the money?
We plan to do more case-study-driven development of the toolchain, supporting a larger subset of TypeScript, as well as building adjacent tools (both LLM-driven and LLM-formal hybrid) for example to ensure intent is matched and to integrate with runtime contracts. We will focus on evaluating our claim that verification can guide development, empirically assessing what we found anecdotally. We will measure both the success rate and quality of resulting artifacts, as well as the speed of development. We will explore varying techniques to improve the agent’s ability to annotate programs and write their proofs of correctness. We can offer benchmarks for Dafny and Lean (our underlying backends) based on the case studies. We will chart a path towards adoption of LemmaScript.
what have you done in the past that proves you will be good at doing this? focus on substance, not credentials
I have contributed to Scala, Clojure (https://github.com/clojure/core.logic/blob/master/src/main/clojure/clojure/core/logic/nominal.clj), and Dafny (https://link.springer.com/chapter/10.1007/978-3-319-09099-3_2), as well as the Closure compiler (https://github.com/google/closure-compiler). I created https://io.livecode.ch/ and have been running it since 2014, sponsored by Digital Ocean. It's a Docker-based service to turn GitHub repositories into interactive tutorials. I wrote a self-hosting Scheme compiler, following a paper on an incremental approach to compiler construction: https://github.com/namin/inc I found soundness bugs in many systems, including Java (https://io.livecode.ch/learn/namin/unsound?img=java8), Scala (https://io.livecode.ch/learn/namin/unsound/scala), Dafny, and Twelf (https://github.com/standardml/twelf/pull/2). I collapsed a reflective tower of interpreters into a one-pass compiler (https://dl.acm.org/doi/10.1145/3158140). I’ve been interested in the intersection of formal verification and LLMs, from the very beginning of LLMs, and have co-created systems like VerMCTS (https://github.com/namin/llm-verified-with-monte-carlo-tree-search), Dafny Sketcher, dafny-annotator. I built claimcheck (see https://midspiral.com/blog/claimcheck-narrowing-the-gapbetween-proof-and-intent/), a tool and approach that does round-trip informalization to make sure that the informal specification matches the formal specification. My current projects include LemmaScript (https://lemmascript.com) on the company side, and, on the academic side, formal-disco (to be released soon). formal-disco is a verified discovery system so that large high-quality diverse synthetic datasets of verified programs can be created for training. So far, we have done so for Dafny, Verus and Frama-C, generating hundreds of thousands of verified programs, that boost one-shot performance of open LLMs to be better than Claude Opus.
what is something that you think everyone else is wrong about? convince me that you are right. Most people swimming in the verification lane are focused on training models to be better at proofs (and often focused on mathematics over programming). In the meantime, agents are continuing to produce and push massive amounts of buggy code, much of which is then being used to further train models. We recognize the technology is ready to be applied. The proofs are already possible and training base models on verification solves only a tiny sliver of the problem because, with the right tooling and ecosystem, frontier LLMs can already write them relatively easily. While everyone is stuck inside labs training models, we want to be giving real-world engineers the tools they need to trust the code AI is generating and begin a flywheel of provably correct generated code.
funding goal
With 10K USD, we will improve LemmaScript as described above. More funding would enable us to expand LemmaScript further, covering a bigger subset of TypeScript, including idioms from popular libraries, and thus supporting more existing code bases. We would also be able to develop it beyond the functional verified core to cover runtime contracts around its uses. This would allow LemmaScript to be applied in a significantly larger number of cases, even positioning it as a standard way to add verified contracts to TypeScript.

Making sure that the code AI pushes to the internet is safe and bug-free

Donate