Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
7 min read·Updated June 14, 2026

GLM-5 is a 744 billion open-source MoE model from Zhipu AI, built entirely on Huawei Ascend chips with zero NVIDIA dependency. The first model from a publicly listed Chinese AI company, it claims to surpass Gemini 3 Pro on coding and agentic benchmarks.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand GLM-5's significance as a frontier model trained entirely without NVIDIA hardware and what that means for the global AI chip landscape
  • Identify GLM-5's core capabilities including its 744 billion MoE architecture, 200K context window, and MIT license
  • Evaluate when GLM-5 is the right choice versus alternatives like Qwen 3.5, DeepSeek V4, or Llama 4

What Is GLM-5?

📝Note

Newer release — GLM 5.2 (June 2026): Zhipu has since shipped GLM 5.2, a coding-first model with a 1 million token context window, live across all four tiers of its GLM Coding Plan and positioned as a permissively licensed alternative to Claude Code and GPT-5.5 for the Asia-Pacific market. A standalone API and MIT-licensed open weights are expected to follow within days of launch. One caveat: Zhipu published no benchmarks at release, so independent performance comparisons are still pending. The rest of this page covers the GLM-5 generation.

GLM-5 is an open-source large language model developed by Zhipu AI, a company spun out of Tsinghua University — China's most prestigious technical university. Released in February 2026, GLM-5 supersedes GLM-4.5 with a massive architectural upgrade: 744 billion total MoE parameters organized into 256 expert sub-networks, with 8 experts activated per token for approximately 44 billion active parameters per forward pass.

What makes GLM-5 historically significant extends beyond its benchmark scores. It was trained entirely on Huawei Ascend chips using the MindSpore framework — with zero dependency on NVIDIA GPUs. In a world where US export controls restrict Chinese access to NVIDIA's most advanced chips, GLM-5 demonstrates that frontier-scale AI models can be built on fully domestic Chinese hardware. This has major implications for the global AI chip landscape and the effectiveness of semiconductor export restrictions.

GLM-5 uses DeepSeek Sparse Attention (DSA) for efficiency — the same attention mechanism that helped DeepSeek achieve strong performance with less compute. Combined with the MoE architecture, this allows GLM-5 to process a 200K token context window while keeping inference costs manageable.

Zhipu AI made history in January 2026 by completing an IPO on the Hong Kong Stock Exchange, raising approximately $558 million — making it the first publicly listed Chinese AI foundation model company. This public listing provides financial transparency and stability that privately-held competitors cannot match.

The model is released under the MIT license — fully open source with no commercial restrictions.

💡Key Concept

Why the chip story matters: The US has imposed increasingly strict export controls on advanced AI chips, aiming to slow China's AI development. GLM-5's training on Huawei Ascend chips — entirely bypassing NVIDIA — is a proof point that these controls may accelerate domestic chip development rather than prevent frontier AI progress. Whether Ascend chips match NVIDIA's efficiency is debated, but GLM-5's competitive benchmark results demonstrate the gap is narrowing.

Tip

Try GLM-5: open.bigmodel.cn — Zhipu AI's platform; also available on Hugging Face under MIT license.

Pricing & Access

OptionPriceDetails
Open Source (Hugging Face)FreeFull 744 billion MoE weights available under MIT license — no commercial restrictions
Zhipu AI APIPay-per-tokenAPI access via open.bigmodel.cn; competitive token pricing
GLM-5-Turbo APIPay-per-token (lower cost)March 2026 variant optimized for agent workflows — faster and cheaper per token
Cloud DeploymentsVariesAvailable through Chinese cloud providers and select international platforms

As a fully MIT-licensed model, GLM-5 can be downloaded and deployed without any licensing fees or commercial restrictions. The primary cost consideration is compute — the full 744 billion MoE model requires substantial GPU resources, though the 8-of-256 expert activation means inference demands are closer to a 44 billion dense model.

Core Capabilities

Frontier Performance Without NVIDIA

GLM-5 claims to surpass Gemini 3 Pro on coding and agentic performance benchmarks — a remarkable achievement for a model trained on non-NVIDIA hardware:

  • Coding benchmarks: Strong results across code generation, bug fixing, and multi-step programming tasks
  • Agentic tasks: Designed for multi-step workflows where the model plans, executes, and iterates — not just single-turn generation
  • General reasoning: Competitive with frontier models on mathematical reasoning, analysis, and knowledge-intensive tasks

MoE Architecture with DeepSeek Sparse Attention

The 744 billion parameter model uses 256 expert sub-networks with 8 activated per token:

  • Efficient inference: Only ~44 billion parameters activate per forward pass, keeping compute costs closer to a mid-size model despite the massive total parameter count
  • DeepSeek Sparse Attention (DSA): Borrowed from DeepSeek's architecture, DSA reduces the computational cost of attention operations — critical for handling the 200K context window efficiently
  • Specialized experts: Different experts activate for different types of content, allowing deep specialization across coding, reasoning, multilingual, and domain-specific tasks

200K Context Window

The 200K token context window supports:

  • Large codebase analysis: Ingest entire repositories or multi-file projects for comprehensive understanding
  • Long document processing: Handle full research papers, legal contracts, and regulatory filings without chunking
  • Extended agent sessions: Multi-step agentic workflows that accumulate context over many planning and execution cycles

GLM-5-Turbo for Agent Workflows

Released in March 2026, GLM-5-Turbo is optimized specifically for agentic use cases:

  • Faster inference: Reduced latency for the rapid back-and-forth of agent loops
  • Lower cost: Cheaper per-token pricing for high-volume agent workflows where many model calls add up
  • Tool use: Enhanced function calling and structured output for integration with external tools and APIs

Strengths

  • NVIDIA-free training: Proof that frontier AI can be built on domestic Chinese hardware (Huawei Ascend + MindSpore) — strategically significant for the global AI landscape
  • MIT license: Fully open source with no commercial restrictions — the most permissive license among frontier-scale models
  • 744 billion MoE scale: Among the largest openly available models, with efficient 8-of-256 expert activation (~44 billion active)
  • Surpasses Gemini 3 Pro: Claims to exceed Google's model on coding and agentic benchmarks
  • Publicly listed company: Zhipu AI's Hong Kong IPO provides financial transparency and stability
  • Tsinghua academic heritage: Research-first approach from China's top technical university
  • Agent-optimized variant: GLM-5-Turbo provides faster, cheaper inference for multi-step agentic workflows

Limitations & Considerations

  • Primarily optimized for Chinese: While English performance is strong, models like Llama 4 or Qwen 3.5 may offer broader multilingual coverage
  • Ascend chip efficiency debate: While GLM-5 demonstrates that Huawei Ascend can train frontier models, questions remain about training efficiency compared to equivalent NVIDIA setups
  • Significant compute for self-hosting: The 744 billion MoE model requires substantial multi-GPU infrastructure even though only ~44 billion parameters activate per token
  • Smaller international community: Most active users and contributors are in the Chinese AI research community, with fewer English-language tutorials and integrations
  • Benchmark verification: Claims of surpassing Gemini 3 Pro await broader independent validation across diverse evaluation suites

Best Use Cases

TaskWhy GLM-5
NVIDIA-free AI deploymentThe only frontier model trained entirely on non-NVIDIA hardware — relevant for organizations navigating chip supply constraints
Chinese academic researchPurpose-built for academic analysis with Tsinghua research heritage
Bilingual Chinese-English workflowsNative-quality output in both languages with natural code-switching
Open-source AI researchMIT license with full weights — maximum freedom for research and commercial use
Agentic workflowsGLM-5-Turbo variant optimized for fast, cost-efficient multi-step agent loops
Large-context analysis200K token window handles substantial codebases and long documents

When to choose alternatives:

  • Broader multilingual coverage beyond Chinese-English → Qwen 3.5
  • Reasoning-specialist with chain-of-thought → DeepSeek R1 or DeepSeek V4
  • English-first open model with largest community → Llama 4
  • Commercial API with enterprise support → Claude or GPT-5.1

Getting Started

  1. Visit open.bigmodel.cn to explore the Zhipu AI platform and API documentation
  2. Browse the model collection on Hugging Face (THUDM) to see available model variants
  3. For API integration, register on open.bigmodel.cn and generate an API key — try GLM-5-Turbo first for the best balance of speed and cost
  4. Test a bilingual prompt (Chinese and English in the same conversation) to experience the model's code-switching fluency
  5. Try an agentic workflow: give GLM-5 a multi-step task (research, plan, execute) to see its planning capabilities
  6. For research use, review the GLM-5 technical paper to understand the Huawei Ascend training methodology and MoE architecture decisions

Tip

The bigger picture: GLM-5's significance extends beyond its benchmark scores. As the first frontier model trained entirely on non-NVIDIA hardware, it represents a strategic milestone in the global AI chip competition. For organizations evaluating AI infrastructure, GLM-5 demonstrates that the NVIDIA-CUDA ecosystem, while dominant, is no longer the only path to frontier-scale AI — a development with implications for chip procurement, supply chain risk, and long-term AI infrastructure planning.

Key Takeaways

  • GLM-5 is a 744 billion MoE model from Zhipu AI (Tsinghua University spinout) — the first frontier model trained entirely on Huawei Ascend chips without any NVIDIA dependency
  • Released under the MIT license with 256 experts (8 activated per token, ~44 billion active), it claims to surpass Gemini 3 Pro on coding and agentic benchmarks
  • Zhipu AI's January 2026 Hong Kong IPO (~$558 million raised) makes it the first publicly listed Chinese AI foundation model company
  • Best for Chinese-English bilingual work, open-source research, and agentic workflows; its strategic importance as an NVIDIA-free proof point extends beyond any single use case

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you