Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
5 min read·Updated April 28, 2026

Mistral Small 4

Mistral AI logoBy Mistral AI

Mistral Small 4 is Mistral AI's efficient Mixture-of-Experts model — 119 billion total parameters with 128 experts and approximately 6 billion active per token, released under the Apache 2.0 license with a 256,000 token context window.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand Mistral Small 4's MoE architecture and efficiency advantages
  • Compare Mistral Small 4 against other small-to-mid-size open-source models
  • Evaluate deployment scenarios where Mistral Small 4 is the right choice

What Is Mistral Small 4?

Mistral Small 4 is Mistral AI's efficient Mixture-of-Experts (MoE) model, released on March 16, 2026 under the Apache 2.0 license — the most permissive license in the major model ecosystem.

The architecture uses 128 MoE experts with approximately 6.5 billion parameters active per token out of 119 billion total parameters. This means the model delivers quality comparable to much larger dense models while using only a fraction of the compute per inference.

Tip

Access Mistral Small 4: Download from mistral.ai or Hugging Face. Also available through Mistral's La Plateforme API and Le Chat.

Architecture

SpecificationValue
Total parameters119 billion
Active parameters per tokenApproximately 6.5 billion
Number of experts128 (4 active per token)
Context window256,000 tokens
LicenseApache 2.0
ReleasedMarch 16, 2026

The Mixture-of-Experts architecture is key to Mistral Small 4's efficiency: instead of activating all 119 billion parameters for every token, the model routes each token through only 4 of its 128 specialized experts (~6.5 billion parameters). This achieves quality close to a dense 119 billion parameter model at the inference cost of a 6.5 billion parameter model.

Mistral Small 4 vs. Other Models

ModelParameters (Active)ContextLicenseKey Strength
Mistral Small 46.5 billion (of 119 billion MoE)256,000Apache 2.0Fully open; efficient MoE; long context
Llama 3.3 70 billion70 billion (dense)128,000Meta CommunityMost deployed open-weight; proven reliability
Phi-4 14 billion14 billion (dense)16,000MITSmall and fast; strong reasoning per parameter
Claude Haiku 4.5Undisclosed200,000Closed APIFastest Claude; sub-200ms; $0.80/$4 per million tokens

Key Advantages

Apache 2.0 License

Mistral Small 4 uses the Apache 2.0 license — the most permissive widely-used open-source license. Unlike Meta's community license (which restricts commercial use above 1 million monthly active users), Apache 2.0 has:

  • No usage restrictions at any scale
  • No commercial limitations
  • Freedom to modify, distribute, and build proprietary products
  • Full compatibility with enterprise legal requirements

256,000 Token Context Window

The 256,000 token context window (~192,000 words) is among the longest for an open-weight model of this efficiency class, enabling:

  • Full document analysis without chunking
  • Long conversation histories
  • Multi-file code understanding
  • Research paper processing in a single context

Efficient Inference

At approximately 6.5 billion active parameters per token, Mistral Small 4 can run on:

  • A single high-end consumer GPU (NVIDIA RTX 4090 or A100)
  • Moderate cloud instances without premium GPU allocation
  • Edge deployment scenarios with sufficient hardware

Strengths

  • Apache 2.0 — most permissive license; no commercial restrictions at any scale
  • Efficient MoE — 119 billion total but only 6.5 billion active per token; excellent quality-per-compute
  • 256,000 token context — among the longest for open-weight models in this efficiency class
  • 128 experts — high specialization across the expert pool
  • European AI — built by Mistral AI (Paris); may meet EU data sovereignty preferences
  • Self-hostable — full control over data and deployment

Limitations and Considerations

  • Not frontier-class — does not compete with Opus 4.7, GPT-5.5, or Gemini 3.1 Pro on the hardest benchmarks
  • MoE complexity — Mixture-of-Experts models can be harder to fine-tune and deploy compared to dense models
  • Memory requirements — while inference is efficient, loading 119 billion total parameters requires significant VRAM
  • Newer model — released March 2026; community tools and fine-tuned variants are still emerging
  • Mistral ecosystem — smaller community than Llama or OpenAI ecosystems

Company Details

DetailInfo
DeveloperMistral AI (Paris, France)
ReleasedMarch 16, 2026
LicenseApache 2.0 (fully open-source)
ArchitectureMixture-of-Experts (128 experts, 4 active per token)
Total parameters119 billion
Active per tokenApproximately 6.5 billion
Context window256,000 tokens
Websitemistral.ai

Key Takeaways

  • Mistral Small 4 is an efficient MoE model — 119 billion total parameters with only 6.5 billion active per token across 128 experts, delivering strong quality at low inference cost
  • Released under Apache 2.0 — the most permissive license available, with no commercial restrictions at any scale
  • 256,000 token context window enables full document analysis and long conversations without chunking
  • Runs on a single high-end GPU; suitable for self-hosted enterprise deployments with data sovereignty requirements
  • Not frontier-class — best suited for production applications where efficiency and openness matter more than maximum benchmark scores

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you