Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
7 min read·Updated May 20, 2026

Mistral Large 3

Mistral AI logoBy Mistral AI

Mistral Large 3 is Mistral AI's flagship open-weight model — a 675 billion MoE architecture with 41 billion active parameters per forward pass, 256K context, and multimodal capabilities. The April 2026 lineup also includes Mistral Medium 3.5 — a 128 billion-parameter dense model that runs on as few as 4 GPUs at $1.50/$7.50 per million tokens, posting 77.6% on SWE-Bench Verified. Mistral expanded into industrial engineering in May 2026 by acquiring Austrian physics-AI company Emmi AI, layering real-time simulations and digital twins onto the Mistral platform for aerospace, automotive, semiconductor, and energy customers.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand what Mistral Large 3 is and how its Mixture-of-Experts architecture delivers frontier performance at reduced inference cost
  • Identify Mistral Large 3's core differentiators: open weights, 256K context, multimodal input, and EU data sovereignty
  • Evaluate when Mistral Large 3 is the right choice versus GPT-5.1, Claude Opus, or Llama 4 Maverick

What Is Mistral Large 3?

Mistral Large 3 is the flagship model from Mistral AI, the Paris-based AI company valued at $14 billion with over $400 million in annual recurring revenue. Superseding Mistral Large 2, it represents a major architectural leap — moving from a dense model to a 675 billion parameter Mixture-of-Experts (MoE) design that activates only 41 billion parameters per forward pass.

This MoE approach means Mistral Large 3 delivers frontier-level reasoning and generation quality while requiring significantly less compute per token than a comparably-sized dense model. At release, it ranked as the top open-source coding model on the LMArena leaderboard and operates at roughly half the cost per token of GPT-5.1 — making it one of the most cost-efficient frontier models available.

Mistral Large 3 supports a 256K token context window (doubled from Mistral Large 2's 128K), handles multimodal inputs (text and images), and maintains fluency across dozens of languages. It is released under a modified MIT license as open weights — meaning organizations can download, deploy, and fine-tune the model on their own infrastructure.

Mistral AI's headquarters in Paris means that data processed through La Plateforme (Mistral's API) stays within EU jurisdiction by default — a meaningful advantage for organizations subject to GDPR or with European data residency requirements.

Tip

Try Mistral Large 3: Access via La Plateforme — Mistral's API platform. Free tier available for experimentation. Also accessible through Le Chat — Mistral's consumer chat interface. API key setup takes under 2 minutes.

Pricing & Access

Access MethodPricingNotes
La Plateforme APIPay-per-token (~half the cost of GPT-5.1)Direct from Mistral; EU data processing; free tier available
Le Chat (consumer)Free / Le Chat Pro subscriptionChat interface with web search, canvas, and multimodal input
Open Weights (Hugging Face)Free downloadSelf-host on your own infrastructure; modified MIT license
AWS BedrockPay-per-token (AWS pricing)Access alongside Claude, Llama, Nova in your AWS account
Azure AIPay-per-token (Azure pricing)Available through Azure AI model catalog
Google Cloud Vertex AIPay-per-token (GCP pricing)Available through Vertex AI Model Garden

Mistral Large 3's MoE architecture means lower per-token inference costs than dense models of comparable capability — a structural cost advantage, not just competitive pricing.

Core Capabilities

Mixture-of-Experts Architecture

Mistral Large 3's 675 billion total parameters are organized into expert sub-networks, with only 41 billion activated per forward pass:

  • Cost efficiency: Roughly half the cost per token of GPT-5.1 while delivering competitive quality — the MoE sparsity means most parameters are dormant on any given inference
  • Scalable deployment: The active parameter count (41 billion) determines GPU memory and compute requirements, not the full 675 billion — making it more practical to self-host than a 675 billion dense model
  • Specialized experts: Different experts activate for different input patterns, allowing the model to maintain deep specialization across tasks without the cost of a monolithic dense architecture

Multimodal & Multilingual

Mistral Large 3 processes both text and image inputs, enabling:

  • Document understanding: Analyze charts, diagrams, screenshots, and scanned documents alongside text instructions
  • Visual reasoning: Answer questions about images, extract data from visual formats, and describe visual content
  • Multilingual fluency: Strong performance across European and Asian languages, with particular strength in French, German, Spanish, and other European languages reflecting Mistral's Paris-based engineering team

256K Context Window

The doubled context window (up from 128K in Mistral Large 2) supports:

  • Large codebases: Analyze entire repositories or multi-file projects in a single prompt
  • Long document processing: Ingest full contracts, research papers, or regulatory filings without chunking
  • Extended conversations: Maintain coherent multi-turn dialogues without losing context from earlier in the conversation

Top-Tier Coding Performance

At release, Mistral Large 3 achieved the top ranking among open-source models on the LMArena coding leaderboard:

  • Code generation: Strong across Python, JavaScript, TypeScript, Rust, Go, and dozens of other languages
  • Code reasoning: Understands complex codebases, identifies bugs, and suggests architectural improvements
  • Agent-compatible: Well-suited for multi-step coding agents that plan, write, test, and iterate on code

Mistral Medium 3.5 — Dense Sibling for Self-Hosters

Mistral ships Mistral Medium 3.5 alongside the Vibe remote agents stack — a complement to Large 3 rather than a replacement. Medium 3.5 is a 128 billion-parameter dense model with the same 256K context window, also under a modified MIT license. The dense architecture is the headline trade-off: it's smaller in raw parameter count than Large 3 but loads in a more predictable way on multi-GPU rigs, with Mistral citing self-hosting on as few as 4 GPUs.

SpecMistral Medium 3.5Mistral Large 3
Architecture128 billion-parameter dense675 billion-parameter MoE (41 billion active)
Context window256K tokens256K tokens
LicenseModified MIT (open weights)Modified MIT (open weights)
API pricing$1.50 / $7.50 per million input/output tokensPay-per-token (~half GPT-5.1)
Self-hosting GPU floorAs few as 4 GPUsMulti-GPU; full 675 billion parameter set
Notable benchmark77.6% SWE-Bench Verified · 91.4 τ³-TelecomTop open-source on LMArena coding leaderboard

When to choose Medium 3.5 over Large 3: smaller GPU footprint for self-hosting, more deterministic cost-per-token at the API tier, and benchmark scores that put it within striking distance of frontier coding models on SWE-Bench. Both models are available on La Plateforme, AWS Bedrock, Azure AI, Google Cloud Vertex AI, and as open-weight downloads on Hugging Face.

Industrial Engineering — Emmi AI Acquisition

Mistral has expanded into industrial engineering with the acquisition of Emmi AI, a Linz-based Austrian startup specializing in physics-based simulation models for aerospace, automotive, semiconductors, and energy. The deal closes in May 2026, brings roughly 30 researchers and engineers onto Mistral's team, and establishes Linz as an official Mistral office — Mistral's first deep technical bench outside the Paris, London, Amsterdam, Munich, San Francisco, and Singapore footprint.

The combined pitch reframes Mistral's positioning beyond chat and coding into real-time simulations and digital twins for high-stakes manufacturing customers. Where Mistral Large 3 and Mistral Medium 3.5 cover language and reasoning workloads, Emmi's physics models bring the simulation-acceleration layer that engineering teams need to compress R&D cycles in domains like jet-engine design, semiconductor process modeling, and battery chemistry. Terms were not disclosed.

For organizations evaluating Mistral as their primary AI vendor, the acquisition signals that the platform's roadmap now reaches into industrial Engineering AI alongside generalist foundation models — a vertical that has historically been served by specialized simulation vendors rather than frontier-AI labs.

Strengths

  • Open weights: Modified MIT license allows download, self-hosting, and fine-tuning — no vendor lock-in
  • Cost-efficient: Roughly half the per-token cost of GPT-5.1 due to MoE architecture with 41 billion active parameters
  • 256K context: Handles large codebases, long documents, and extended conversations
  • Multimodal input: Processes text and images for document understanding and visual reasoning
  • Top coding model: Ranked first among open-source models on LMArena at release
  • EU data sovereignty: Processing through La Plateforme stays within EU jurisdiction — meaningful for GDPR compliance
  • Multi-cloud availability: Available on AWS Bedrock, Azure AI, Google Cloud, and via direct API

Limitations & Considerations

  • Large model footprint: While only 41 billion parameters activate per forward pass, self-hosting still requires loading the full 675 billion parameter set into memory — demanding multi-GPU infrastructure
  • Modified MIT license: Not a fully permissive license — review terms for commercial use cases to ensure compliance
  • Smaller ecosystem: Fewer community tutorials, plugins, and third-party integrations compared to ChatGPT or Claude
  • Newer model: As the successor to Mistral Large 2, independent third-party benchmarks and long-term reliability data are still accumulating

Best Use Cases

TaskWhy Mistral Large 3
Cost-sensitive frontier workloads~50% the per-token cost of GPT-5.1 with competitive quality
European organizations under GDPREU data processing by default through La Plateforme — no transatlantic data transfer
Self-hosted enterprise deploymentOpen weights allow deployment on private infrastructure with full control
Coding tasks and agentsTop open-source coding model on LMArena — strong for generation, review, and agentic workflows
Multimodal document processingText + image input for analyzing charts, screenshots, and visual documents
Large context workloads256K context handles full codebases and long documents without chunking

When to choose alternatives:

  • Need the absolute strongest reasoning → GPT-5.1 or Claude Opus for the most complex multi-step analysis
  • Want the largest open-source ecosystem → Llama 4 Maverick has the broadest community and tooling
  • Need a fully permissive open-source license → DeepSeek or Apache 2.0 licensed models
  • Coding-specific specialist → Devstral 2 (Mistral's dedicated coding model) for IDE-integrated development

Getting Started

  1. Create a Mistral account — sign up at console.mistral.ai with email or Google sign-in
  2. Generate an API key — navigate to API Keys in the console and create a new key
  3. Test in Le Chat — try Mistral Large 3 in the chat interface at chat.mistral.ai to evaluate quality before writing code
  4. Install the SDKpip install mistralai (Python) or npm install @mistralai/mistralai (JavaScript)
  5. Make your first API call — use the chat completions endpoint with the Mistral Large 3 model ID
  6. Try multimodal input — send an image alongside a text prompt to test visual understanding capabilities
  7. Compare with alternatives — test the same prompts on GPT-5.1 and Claude to identify which model performs best for your specific use case

Tip

Open-weight advantage: Because Mistral Large 3 is available as open weights, enterprises can fine-tune it on domain-specific data and deploy it behind their own firewall — something impossible with closed models like GPT-5.1. Combined with EU data sovereignty through La Plateforme for teams that prefer managed API access, Mistral Large 3 offers deployment flexibility that few frontier models can match.

Key Takeaways

  • Mistral Large 3 is Mistral AI's flagship model — a 675 billion MoE architecture (41 billion active) that delivers frontier performance at roughly half the cost per token of GPT-5.1
  • Mistral Medium 3.5 (April 2026) complements Large 3 as a 128 billion-parameter dense model with the same 256K context, posting 77.6% on SWE-Bench Verified and self-hosting on as few as 4 GPUs at $1.50 / $7.50 per million input/output tokens
  • Open weights under a modified MIT license enable self-hosting, fine-tuning, and deployment on private infrastructure — eliminating vendor lock-in
  • The 256K context window, multimodal input, and top-tier coding performance make Large 3 competitive across enterprise, development, and research use cases
  • Available on all major cloud providers plus direct API with EU data sovereignty — making the Mistral lineup the strongest choice for European organizations and cost-conscious teams that want frontier-quality AI without closed-model pricing

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you