Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
8 min read·Updated May 22, 2026

DeepSeek

DeepSeek logoBy DeepSeek

DeepSeek is the Chinese AI lab that shocked the industry in early 2025 by releasing a frontier-class reasoning model trained for a fraction of the cost of comparable US models. April 2026: DeepSeek shipped V4-Pro (1.6 trillion-parameter MoE, 1 million-token context) and V4-Flash (284 billion total / 13 billion active) — both MIT-licensed and priced well below frontier rivals.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand why DeepSeek's cost-efficient training methods had such a significant impact on the AI industry
  • Distinguish between DeepSeek's V4-Pro and V4-Flash (April 2026), the V3/V3.2 chat models, and the R1 reasoning model
  • Know when to use DeepSeek's API or chat interface vs. running the open-weight models locally
  • Understand the security and regulatory concerns surrounding DeepSeek

What Is DeepSeek?

DeepSeek is a Chinese AI research lab founded in 2023 by High-Flyer Capital Management, a quantitative hedge fund. In January 2025, DeepSeek released DeepSeek R1 — an open-source reasoning model that matched OpenAI's o1 on major benchmarks — and simultaneously published a research paper claiming the model was trained for approximately $5.6 million, compared to the hundreds of millions spent training comparable US frontier models.

The release triggered immediate global reaction. On January 27, 2025, NVIDIA lost $589 billion in market value in a single day — the largest single-day loss in stock market history. The Nasdaq fell 3.1%, and approximately $1 trillion was wiped from US tech stocks. Markets subsequently recovered fully, with NVIDIA reaching a $5 trillion market cap by October 2025. But the episode forced a fundamental reexamination of assumptions about the capital requirements for frontier AI.

DeepSeek offers three primary products in 2026:

  1. DeepSeek V4-Pro and V4-Flash (April 2026) — the new flagship Mixture-of-Experts foundation models with 1 million-token context windows, MIT-licensed
  2. DeepSeek Chat (V3/V3.2) — the previous-generation general-purpose conversational models, still widely used
  3. DeepSeek R1 — the chain-of-thought reasoning model designed for complex mathematical, coding, and logical tasks

💡Key Concept

The significance of DeepSeek R1: Prior to DeepSeek, open-source reasoning models significantly lagged behind closed-source leaders like OpenAI's o1. DeepSeek R1 was the first open-source reasoning model to reach competitive performance with o1 on math olympiad problems, coding tasks, and logical reasoning benchmarks — while being freely downloadable and MIT licensed. This shattered the assumption that frontier reasoning capability required proprietary model weights and massive compute budgets.

Tip

Try DeepSeek: chat.deepseek.com — free; API access via platform.deepseek.com

The DeepSeek Model Family

ModelTypeKey Strengths
DeepSeek V4-ProOpen (MIT, April 2026)Flagship MoE; 1.6 trillion total / 49 billion active parameters; 1 million-token context; uses ~27% of V3.2's FLOPs and ~10% of KV cache at 1M context
DeepSeek V4-FlashOpen (MIT, April 2026)Smaller MoE; 284 billion total / 13 billion active parameters; 1 million-token context; cheapest frontier-adjacent model on the market
DeepSeek V3.2Open (MIT)Previous-gen chat model; 671 billion MoE; ~22 billion active per token; 128K context
DeepSeek V3.2-SpecialeLimited (Dec 2025)Competition model; IMO gold (35/42); 10th place IOI; 96.0% AIME (vs GPT-5-High 94.6%); API discontinued due to extreme compute costs
DeepSeek R1Open (MIT)Reasoning model; chain-of-thought; matches OpenAI o1 on math/logic/coding; 128K context
DeepSeek R1-0528Open (MIT)Updated R1 with improved multi-step accuracy, reduced hallucination, JSON output, and function-calling capabilities
DeepSeek R1 Distilled (1.5 billion–70 billion)Open (MIT)Smaller distilled versions of R1 reasoning; run on consumer hardware

V4 — Frontier-Adjacent Open Weights (April 2026)

On April 24, 2026, DeepSeek released two new MoE foundation models — V4-Pro and V4-Flash — both MIT-licensed and downloadable from Hugging Face. These are the first DeepSeek models to ship with a 1 million-token context window, matching Claude and Gemini's industry-leading context length.

V4-Pro is the flagship: 1.6 trillion total parameters with 49 billion active per token in an MoE architecture. The full model weighs roughly 865 GB on Hugging Face. According to DeepSeek's accompanying paper, V4-Pro trails state-of-the-art frontier models by approximately 3 to 6 months on most benchmarks but represents the largest open-weights model ever released. Notable efficiency gain: V4-Pro uses approximately 27% of V3.2's FLOPs and 10% of the KV cache at 1 million-token context — meaningful both for training cost and on-device inference.

V4-Flash is the smaller sibling at 284 billion total / 13 billion active parameters (~160 GB), targeting the cost-efficient inference tier. Pricing puts V4-Flash below GPT-5.4 Nano, Claude Haiku, and the Gemini Flash variants.

API pricing on platform.deepseek.com:

  • V4-Pro: $1.74 / $3.48 per million input/output tokens — undercuts Claude Sonnet and the larger GPT-5.4 tier
  • V4-Flash: $0.14 / $0.28 per million input/output tokens — cheapest frontier-adjacent option in the market

Both models are available immediately via DeepSeek's API, on Hugging Face for self-hosting, and through third-party providers (Together.ai, Fireworks AI, Groq).

First Outside Funding Round — Open-Source AGI Mandate

DeepSeek's first outside venture round sits at approximately 70 billion yuan (about $10 billion) at a $45 billion pre-money valuation — placing DeepSeek alongside frontier US labs by multiple, even though its training spend remains a fraction of theirs. The round is led by Beijing's National Artificial Intelligence Industry Investment Fund, China's state-backed strategic AI vehicle (distinct from the chip-focused "Big Fund"), with Tencent, IDG Capital, and Monolith Capital participating. The state-backed fund is expected to contribute roughly 10 billion yuan, and founder Liang Wenfeng another 20 billion yuan from his own holdings — Liang retains roughly 90% of the company through High-Flyer Capital Management and had not previously sought outside capital.

The defining feature of the round is the mandate Liang has set publicly with investors: DeepSeek will keep developing open-source models and pursue artificial general intelligence as its core goal, resisting the usual pressure to chase near-term commercialization. That posture is unusual at this valuation tier — most frontier US labs draw their largest checks from corporate cloud partners (OpenAI and Microsoft, Anthropic and Amazon) and treat AGI claims with strategic ambiguity. DeepSeek is doing the opposite, and doing it with one of the largest state-aligned bets on AGI to date outside the United States.

V3.2-Speciale — Competition-Grade Performance

In December 2025, DeepSeek briefly released V3.2-Speciale, a competition-focused model that achieved extraordinary results:

  • IMO gold medal with 35 out of 42 points
  • 10th place at IOI (International Olympiad in Informatics)
  • 96.0% on AIME — exceeding GPT-5-High's 94.6%

The model was available via API only until December 15, 2025, before being discontinued due to extreme computational costs. V3.2-Speciale demonstrated that DeepSeek's training methodology could produce models competitive with or exceeding the very best US frontier models on the hardest reasoning tasks.

Core Capabilities

DeepSeek V3.2 — Efficient Frontier Chat

DeepSeek V3.2 is DeepSeek's general-purpose chat model. Its technical architecture uses Mixture-of-Experts (MoE) with 671 billion total parameters, activating approximately 22 billion for any given query — delivering frontier-adjacent performance at dramatically lower inference cost.

Key capabilities:

  • Multi-turn conversation and instruction following
  • Code generation, debugging, and explanation across major languages
  • Mathematical reasoning and problem-solving
  • Document summarization and analysis
  • 128K context window for long documents

DeepSeek R1 — Open-Source Reasoning

DeepSeek R1 uses a process similar to OpenAI's chain-of-thought training — the model explicitly "thinks through" problems before producing a final answer. You can see this reasoning process in the response (shown as a collapsible "thinking" section in the chat interface).

The updated R1-0528 variant adds JSON output and function-calling capabilities, making it more practical for agentic and structured-output applications.

This approach excels at:

  • Mathematics: Competition-level math problems, proofs, calculations
  • Coding: Debugging complex programs, writing algorithms from specifications
  • Logic puzzles: Multi-step reasoning chains, formal logic
  • Scientific reasoning: Physics, chemistry, biology problem-solving

Hybrid Thinking Mode

DeepSeek's chat interface supports a thinking mode toggle — switching between quick responses (V3.2 chat mode) and extended reasoning (R1 mode). This is similar to Claude's extended thinking or ChatGPT's reasoning mode — useful for hard problems, unnecessary for simple queries.

Pricing & Access

Access MethodCostDetails
chat.deepseek.comFreeWeb interface; access to V4 models and R1; thinking mode toggle; no account required for basic use
V4-Pro API$1.74/$3.48 per million tokens (input/output)Flagship 1.6 trillion-parameter MoE; 1 million-token context; undercuts Claude Sonnet and larger GPT-5.4 tier
V4-Flash API$0.14/$0.28 per million tokens (input/output)Cheapest frontier-adjacent model; below GPT-5.4 Nano and Claude Haiku
V3.2 API~$0.27/$1.10 per million tokens (input/output)Previous-gen flagship; still available for cost-sensitive workloads
R1 API~$0.55/$2.19 per million tokensReasoning model pricing; significantly cheaper than OpenAI o1 API
Open-weight download (Hugging Face)FreeMIT license; all model weights downloadable; run locally with Ollama, vLLM, or llama.cpp
Third-party API providersUsage-basedTogether.ai, Fireworks AI, Groq, and others host DeepSeek models; often with faster inference

DeepSeek's API pricing is among the lowest in the industry for frontier-class models — typically 10–15x cheaper than equivalent OpenAI API calls.

⚠️Warning

Data privacy note: Using DeepSeek's API or chat.deepseek.com sends your data to servers in China, subject to Chinese data law. This has led several governments and organizations to block or restrict DeepSeek access. For privacy-sensitive use, download the open-weight models (MIT license) and run locally — this eliminates any data transfer to DeepSeek's servers.

The Training Cost Story

DeepSeek's most significant contribution to the field may not be the model itself, but the training methodology paper. DeepSeek published that V3 was trained on approximately 2,000 H800 GPUs (lower-spec than the H100s used for US frontier models due to export restrictions) and cost ~$5.9 million in GPU compute — compared to estimates of $50–100 million or more for comparable US models.

The techniques that enabled this efficiency:

  • FP8 training precision — reducing memory requirements without significant quality loss
  • Mixture-of-Experts architecture — routing tokens to specialized sub-networks
  • Multi-Token Prediction (MTP) — predicting multiple future tokens simultaneously, improving training efficiency
  • DualPipe pipeline parallelism — reducing communication overhead in distributed training

These innovations have since influenced training approaches across the industry.

Security Concerns

In early 2025, cloud security firm Wiz discovered a publicly accessible DeepSeek database containing over 1 million sensitive records — including chat histories, API keys, and backend details — with zero authentication. The database was exposed to anyone on the internet without any login required. DeepSeek secured the database after Wiz reported the issue, but the incident raised serious questions about the company's security practices.

Government Bans & Restrictions

DeepSeek's Chinese origin and data handling practices have led to significant restrictions:

  • Italy blocked DeepSeek in January 2025, citing GDPR violations
  • Banned on government devices in South Korea, Australia, Taiwan, and Texas
  • Restricted in US House of Representatives offices, NASA, US Navy, and the Pentagon
  • Multiple other governments have issued advisories against using DeepSeek for sensitive work

These restrictions apply to DeepSeek's cloud services — running the open-weight models locally on your own infrastructure is not affected by any ban.

Strengths

  • Cost efficiency: Frontier-class performance at a fraction of the training and inference cost of US competitors
  • Open-weight MIT license: Freely downloadable, fine-tunable, and commercially usable with no restrictions
  • R1 reasoning: First open-source reasoning model to match OpenAI o1 — a genuine breakthrough for open AI
  • V3.2-Speciale results: IMO gold and 96% AIME demonstrate competition-grade capability exceeding some US frontier models
  • Distilled variants: 1.5 billion–70 billion distilled R1 models run on consumer hardware while retaining strong reasoning
  • Competitive API pricing: 10–15x cheaper than OpenAI or Anthropic API equivalents for similar capability
  • Transparent thinking: R1's reasoning chain is visible, helping users understand how the model arrived at answers

Limitations & Considerations

  • Data privacy: API and web usage routes data to Chinese servers — use local models for sensitive data
  • Security track record: The Wiz database exposure (1 million+ records with zero authentication) raises concerns about operational security
  • Censored content: DeepSeek refuses to discuss politically sensitive topics (Taiwan, Tiananmen Square, etc.) — more restrictive than US models in these areas
  • Government bans: Blocked or restricted on government systems in multiple countries — check your organization's policy before using cloud services
  • Infrastructure reliability: DeepSeek's own servers have experienced capacity issues during peak demand — third-party API hosts often more reliable
  • Not best-in-class for all tasks: Trailing slightly behind the very latest US frontier models (Claude Opus 4.7, GPT-5.5) on complex instruction-following and nuanced writing tasks

Best Use Cases

TaskWhy DeepSeek
Math and science problem-solvingR1 reasoning model competes with the best closed-source models; Speciale exceeded GPT-5-High on AIME
Budget-conscious API deployments10–15x cheaper than OpenAI API for comparable output quality
Open-source research and experimentationMIT license; full weights available; reproducible results
On-premise AI (privacy-sensitive organizations)Download R1/V3 locally; no data leaves your infrastructure
Coding tasks and debuggingV3.2 and Coder variants are among the best open-weight coding models

When to choose alternatives:

  • Privacy-sensitive cloud use without local infrastructure → Mistral Le Chat (EU servers)
  • Broadest capability ceiling → Claude Opus 4.7, GPT-5.5
  • Source-cited research → Perplexity
  • Workplace productivity integration → Microsoft 365 Copilot or Google Workspace AI

Getting Started

  1. Go to chat.deepseek.com — free with an email account
  2. Toggle Think mode on for a math or coding problem and observe the extended reasoning process
  3. For local deployment: install Ollama and run ollama run deepseek-r1:7b (7 billion distilled R1 — runs on most consumer GPUs)
  4. For API access: visit platform.deepseek.com — API pricing is significantly lower than US alternatives

Key Takeaways

  • DeepSeek's R1 release in January 2025 was a watershed moment — the first open-source reasoning model competitive with OpenAI's o1, trained for ~$5.9 million rather than hundreds of millions, triggering a $589 billion single-day NVIDIA stock drop
  • V3.2-Speciale (Dec 2025) achieved IMO gold and 96.0% AIME, exceeding GPT-5-High — demonstrating that DeepSeek can match or beat the best US models on competition-grade reasoning
  • MIT license means all DeepSeek model weights are freely downloadable, fine-tunable, and deployable on-premise — eliminating data privacy concerns associated with their cloud API
  • Security concerns are real: Wiz discovered a publicly accessible database with 1 million+ records and zero authentication; multiple governments have banned DeepSeek on official devices
  • DeepSeek V4 shipped April 2026 in two MIT-licensed variants — V4-Pro (1.6 trillion total / 49 billion active, 1 million-token context, the largest open-weights model ever released) and V4-Flash (284 billion total / 13 billion active) — both undercutting the equivalent US frontier-tier API pricing
  • DeepSeek's first outside funding round stands at approximately $10 billion (70 billion yuan) at a $45 billion pre-money valuation, led by Beijing's National AI Industry Investment Fund with Tencent, IDG Capital, and Monolith Capital participating; founder Liang Wenfeng retains roughly 90% control through High-Flyer and has publicly committed the lab to open-source AGI as its core goal

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you