Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
6 min read·Updated April 28, 2026

Llama 4

Meta logoBy Meta

Llama 4 is Meta's open-weight foundation model family — featuring the Mixture-of-Experts architecture, with Maverick (400 billion total / 17 billion active parameters, 1 million context) and Scout (10 million token context) variants that are the most downloaded open-weight frontier models in the world.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand the Llama 4 model family and how Mixture-of-Experts architecture works
  • Compare Maverick, Scout, and Llama 3.3 to choose the right model for different use cases
  • Evaluate Llama 4's open-weight licensing and its implications for developers and enterprises

What Is Llama 4?

Llama 4 is Meta's latest generation of open-weight foundation models — the most downloaded open-weight frontier models in the world. Released in April 2025, Llama 4 introduced Meta's first Mixture-of-Experts (MoE) architecture, which dramatically improves efficiency: the models have hundreds of billions of total parameters but only activate a fraction for each token, achieving frontier performance at a fraction of the compute cost.

Meta open-sources Llama models as a strategic choice: by commoditizing the model layer, Meta reduces the cost of AI infrastructure that powers its own products (Facebook, Instagram, WhatsApp, Ray-Ban glasses) while building an ecosystem that makes Llama the default choice for developers worldwide.

Tip

Get Llama 4: Available on Hugging Face, llama.com, and through all major cloud providers (AWS, Azure, Google Cloud, Together, Fireworks, etc.). Free to download and deploy.

The Llama 4 Family

Llama 4 Maverick — The Flagship

Llama 4 Maverick is Meta's most capable open-weight model:

  • 400 billion total parameters, 17 billion active — MoE architecture with 128 experts, 1 routed per token
  • 1 million token context window — matching Claude Opus 4.7 and GPT-5.5
  • Multimodal — processes both text and image inputs natively
  • LMArena Elo: 1,417 at launch — competitive with frontier closed models
  • Available under Meta's community license (free for most commercial use)

The MoE architecture is key: although Maverick has 400 billion total parameters, only 17 billion are active for each token. This means it delivers frontier-class performance while requiring significantly less compute per inference than a dense 400 billion parameter model.

Llama 4 Maverick

Meta AI

Closed

Strengths

Most downloaded open-weight frontier model; MoE (400 billion/17 billion active); multimodal; 1 million context; 1,417 LMArena Elo

Context Window

1 million tokens

Pricing

Free (open-weight, Meta community license)

Llama 4 Scout — Extended Context

Llama 4 Scout is optimized for extremely long context scenarios:

  • 109 billion total parameters, 17 billion active — smaller MoE (16 experts)
  • 10 million token context window — the largest context of any major released model (10x Maverick, 10x GPT-5.5)
  • Fits on a single H100 GPU — practical for self-hosted deployment
  • Ideal for processing massive document collections, entire codebases, or very long conversation histories

Ten million tokens is approximately 7.5 million words — enough to process an entire library of technical documentation or a full year of corporate communications in a single context.

Llama 3.3 70 Billion — The Production Workhorse

While Llama 4 gets the headlines, Llama 3.3 70 billion remains the most widely deployed open-weight model in production:

  • Dense architecture (simpler to deploy than MoE)
  • Proven reliability across thousands of production deployments
  • Performance competitive with the earlier Llama 3.1 405 billion at a fraction of the cost
  • Runs on a single high-end GPU (A100 or H100)

For teams that need a proven, efficient, well-understood model, Llama 3.3 is often the pragmatic choice.

Licensing

DetailInfo
LicenseMeta Llama Community License
Commercial useYes — free for most businesses
RestrictionCompanies with 1 million+ monthly active users must request a license from Meta
WeightsDownloadable from Hugging Face and llama.com
Fine-tuningPermitted; derivatives must include attribution
Not fully OSI open-sourceRestrictions on large commercial use disqualify it from the Open Source Initiative definition

For individual developers and most businesses, the license is effectively free and unrestricted. The 1 million MAU threshold only affects the largest companies.

Choosing Between Llama Models

Use CaseRecommended ModelWhy
General-purpose frontier tasksLlama 4 MaverickBest capability; 1 million context; multimodal
Extremely long documents (over 1 million tokens)Llama 4 Scout10 million token context; single-GPU deployment
Production deployment (proven reliability)Llama 3.3 70 billionDense architecture; simpler ops; battle-tested
On-device / mobileLlama 3.2 (1 billion/3 billion)Smallest models; designed for edge deployment
Budget-constrained inferenceLlama 4 Scout17 billion active parameters; efficient MoE

Llama 4 vs. Competing Open-Weight Models

ModelArchitectureContextLicenseKey Strength
Llama 4 Maverick (Meta)MoE 400 billion/17 billion1 millionMeta CommunityMost downloaded; largest ecosystem; multimodal
DeepSeek V3 (DeepSeek)Dense128,000MITFrontier reasoning; extremely cost-efficient training
Gemma 3 (Google)Dense (1 billion-27 billion)128,000Google permissiveSmall and efficient; great for consumer hardware
Phi-4 (Microsoft)Dense (3.8 billion)16,000MITExceptional math/coding for size; fully open-source
GPT-OSS (OpenAI)Dense128,000Apache 2.0OpenAI's first open model; fine-tunable

The "Avocado" Question

Meta's next-generation model (codenamed "Avocado", potentially Llama 5) completed pre-training in January 2026 but has been delayed to May 2026 after failing to match leading competitors in reasoning, coding, and writing benchmarks. Reports suggest Meta may release Avocado as a closed-source model — a dramatic departure from Meta's open-source tradition.

⚠️Warning

If confirmed, a closed-source shift would fundamentally change the open-weight AI landscape. Llama has been the most widely deployed open-weight model family — if Meta moves to closed-source, DeepSeek and Mistral become the most likely standard-bearers for open AI.

Strengths

  • Most downloaded open-weight frontier model — largest community, most third-party fine-tunes, broadest cloud provider support
  • MoE efficiency — frontier performance with only 17 billion active parameters per token
  • 1 million / 10 million token context — Maverick matches closed models; Scout offers 10x more
  • Multimodal — native text and image processing in Maverick
  • Free for most commercial use — Meta community license is effectively unrestricted for the vast majority of developers
  • Broad deployment options — Hugging Face, all major clouds, local via Ollama, fine-tunable with Torchtune

Limitations & Considerations

  • Not fully open-source — Meta's license restricts companies above 1 million MAU; not OSI-compliant
  • MoE complexity — MoE models are harder to deploy, fine-tune, and optimize than dense models; some teams prefer Llama 3.3 for simplicity
  • Closed-source risk — Meta may shift to closed-source for future models ("Avocado"), potentially fragmenting the Llama ecosystem
  • Llama 4 Behemoth delays — the largest Llama 4 model remains unreleased, reportedly facing engineering challenges
  • Benchmark gaps — while competitive, Llama 4 Maverick trails Claude Opus 4.7 and GPT-5.5 on SWE-bench and some reasoning tasks

Key Takeaways

  • Llama 4 is Meta's open-weight frontier model family — Maverick (400 billion/17 billion MoE, 1 million context, multimodal) and Scout (10 million context, single-GPU) are the most downloaded open-weight models in the world
  • MoE architecture delivers frontier performance at a fraction of the inference cost of dense models of equivalent total parameter count
  • Free for most commercial use under Meta's community license; Llama 3.3 70 billion remains the go-to for proven production deployments
  • Meta's potential shift to closed-source for its next model ("Avocado") could fundamentally reshape the open-weight AI ecosystem

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you