Learning Objectives
- Understand what Leanstral 1.5 is and why formal theorem proving is a distinct, high-value niche within AI
- Identify its core specs: an open Apache-2.0 mixture-of-experts model tuned for Lean 4 proof engineering
- Evaluate when a formal-verification model like Leanstral is the right tool versus a general-purpose coding model
What Is Leanstral 1.5?
Leanstral 1.5 is an open model from Mistral AI, the Paris-based lab, built for one narrow but demanding job: writing and checking formal mathematical proofs in Lean 4, the proof-assistant language used by mathematicians and verification engineers. It is not a general chatbot — it is a specialist that turns mathematical claims and code correctness into machine-checkable proofs.
Under the hood, Leanstral 1.5 uses a mixture-of-experts (MoE) architecture with 119 billion total parameters but only 6 billion active per forward pass, which keeps inference cost low relative to its capability. Mistral trained it by combining supervised fine-tuning with reinforcement learning, and it is released under the permissive Apache 2.0 license — meaning the weights can be downloaded, self-hosted, and used commercially without restriction.
The reason formal verification matters: unlike a normal language model that produces plausible-sounding answers, a Lean proof either checks or it does not. That makes the output verifiable by construction — a rare property in AI, and the whole point of the formal-methods field.
💡Key Concept
Formal theorem proving: Writing mathematical arguments in a language a computer can mechanically verify, step by step, with no gaps. If the proof checks, the result is guaranteed correct. Tools like Lean 4 are used both to verify advanced mathematics and to prove that software behaves exactly as specified.
✅Tip
Try Leanstral 1.5: Available with free API access on La Plateforme, as open weights on Hugging Face, and integrated into Mistral Vibe. Best paired with a Lean 4 development environment.
Benchmark Performance
Leanstral 1.5 posts state-of-the-art results across the standard formal-math benchmarks. On the widely used miniF2F benchmark it reaches a perfect 100 percent, and it solves 587 of 672 problems on PutnamBench — a suite drawn from the Putnam mathematics competition. On graduate-level algebra it scores 87 percent on FATE-H, and on the far harder PhD-level FATE-X it reaches 34 percent. The model also shows strong test-time scaling, continuing to improve as it is given more room to reason, up to 4 million tokens.
| Benchmark | Score | What it measures |
|---|---|---|
| miniF2F | 100 percent | Standard formal-math proof benchmark |
| PutnamBench | 587 of 672 solved | Putnam-competition-level problems |
| FATE-H | 87 percent | Graduate-level algebra |
| FATE-X | 34 percent | PhD-level algebra |
A perfect miniF2F score should be read with the usual caution — benchmarks saturate, and formal-math suites are narrower than open-ended reasoning — but the PutnamBench and FATE-X numbers show genuine strength on problems that remain hard even for specialists.
Finding Real Software Bugs
The more practical result is that formal methods are not just for pure mathematics. During testing, Leanstral 1.5 flagged five previously unknown bugs across 57 open-source repositories, including a critical integer-overflow flaw in a decoding library. Because the model reasons about code the way it reasons about a proof — asking whether a stated property actually holds — it can surface edge cases that ordinary testing and code review miss.
This points at a broader thesis for the field: as formal-verification models get cheaper and more capable, proving software correct becomes a realistic path to safer systems, not just an academic exercise reserved for aerospace and cryptography.
Strengths
- Open and permissive: Apache 2.0 license allows self-hosting, fine-tuning, and commercial use with no restrictions
- Efficient: Mixture-of-experts design activates only 6 billion of 119 billion parameters per forward pass
- State-of-the-art on formal math: Perfect miniF2F, strong PutnamBench and FATE results
- Practical bug discovery: Found real, previously unknown flaws in open-source code
- Verifiable output: Lean proofs either check or they do not — no hallucinated correctness
Limitations & Considerations
- Narrow by design: Leanstral is a formal-proof specialist, not a general assistant — it is the wrong tool for writing prose, answering open questions, or everyday coding
- Requires Lean expertise: Getting value out of it assumes familiarity with Lean 4 and the formal-methods workflow
- Benchmark saturation: A perfect miniF2F score reflects a maturing benchmark as much as raw capability — judge it on the harder FATE-X and real-world bug results
- Emerging field: Formal-verification-with-AI is early; tooling, integrations, and best practices are still forming
Best Use Cases
| Task | Why Leanstral 1.5 |
|---|---|
| Formalizing mathematics in Lean 4 | Purpose-built for proof engineering with state-of-the-art benchmark results |
| Verifying software correctness | Reasons about stated properties to surface edge-case bugs testing misses |
| Research in automated reasoning | Open weights and Apache 2.0 license give full freedom to study and extend |
| Teaching formal methods | A capable, free proof assistant for coursework and self-study |
When to choose alternatives:
- General-purpose coding → a mainstream coding model or agent, not a proof specialist
- Open-ended reasoning or writing → a frontier chat model
- No Lean 4 in your workflow → the formal-methods entry cost may outweigh the benefit today
Getting Started
- Set up Lean 4 — install the Lean toolchain and an editor extension so proofs can be checked locally
- Get API access — sign up at console.mistral.ai for free API access, or download the open weights from Hugging Face
- Start small — hand the model a known theorem and confirm the generated Lean proof actually checks
- Try verification — point it at a small function with a stated property to see whether it can prove or break the claim
- Scale the reasoning budget — for harder targets, give the model more test-time tokens, where it continues to improve
Key Takeaways
- Leanstral 1.5 is Mistral AI's open Apache-2.0 model for formal mathematical proof in Lean 4 — a specialist verification tool, not a general chatbot
- Its mixture-of-experts design carries 119 billion total parameters but activates only 6 billion per forward pass, keeping it efficient
- It reaches a perfect 100 percent on miniF2F, solves 587 of 672 PutnamBench problems, and scores 34 percent on PhD-level FATE-X
- During testing it found five previously unknown bugs across 57 open-source repositories, including a critical integer-overflow flaw — evidence that formal verification is a practical route to safer software
- Best for mathematicians, verification engineers, and researchers working in Lean 4; the wrong tool for general coding or writing