Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
5 min read·Updated March 27, 2026

Huawei Ascend 950PR

Huawei logoBy Huawei

The Huawei Ascend 950PR is China's most powerful domestically produced AI accelerator — featuring 1.56 petaFLOPS of FP4 compute and in-house HBM memory — built under US sanctions to reduce China's dependence on NVIDIA.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand what the Ascend 950PR is and how US sanctions shaped its development
  • Evaluate the chip's specifications, strengths, and technical limitations
  • Assess Huawei's AI chip roadmap and its impact on the global AI hardware landscape

What Is the Ascend 950PR?

The Huawei Ascend 950PR is China's most powerful domestically designed AI accelerator, launched in Q1 2026 as the Atlas 350 accelerator card. It represents Huawei's answer to a fundamental question: can China build world-class AI chips without access to NVIDIA, TSMC's advanced nodes, or Western memory suppliers?

The answer, so far, is a qualified yes. The 950PR delivers 1.56 petaFLOPS of FP4 compute — roughly 2.8 times the performance of NVIDIA's H20 (the only NVIDIA chip still legally available in China) on compute-intensive tasks. But it comes with significant trade-offs in memory bandwidth, yield rates, and software ecosystem maturity.

💡Key Concept

US Export Controls on AI Chips: Since October 2022, the US has progressively restricted the sale of advanced AI chips to China. NVIDIA's H100, H200, and Blackwell chips are banned. The H20 — a deliberately hobbled version designed for the Chinese market — is the only NVIDIA chip still legally available. Huawei's Ascend line exists specifically to fill this gap with domestically produced alternatives.

Specifications

SpecAscend 950PR (Atlas 350)NVIDIA H20 (China-legal)
FP4 Compute1.56 petaFLOPS~0.56 petaFLOPS (estimated)
FP8 Compute1 petaFLOPSNot available
HBM Capacity112 GB (HiBL 1.0, in-house)96 GB (HBM3)
HBM Bandwidth1.4-1.6 TB/s4.0 TB/s
Power (TDP)600W~400W
Process NodeSMIC 7nmTSMC 4nm
Target WorkloadPrefill inference and recommendationsGeneral inference

⚠️Warning

Huawei's "2.8 times H20 performance" claim uses FP4 peak compute throughput — a metric where the 950PR excels. However, the H20 has 2.5 times more memory bandwidth (4.0 versus 1.6 TB/s), which matters more for memory-bound workloads like long-context LLM decoding. The real-world performance gap depends heavily on the specific workload.

In-House HBM: HiBL 1.0

One of the most strategically significant aspects of the 950PR is its memory. US sanctions prevent Huawei from sourcing HBM chips from SK Hynix, Samsung, or Micron — the world's only three HBM suppliers. So Huawei developed its own: HiBL 1.0 (High-Bandwidth Low-cost memory).

HiBL 1.0 delivers approximately 1.6 TB/s bandwidth — roughly equivalent to HBM2e-class performance, well below the 3.35 to 4.8 TB/s of HBM3/HBM3e used in NVIDIA's latest chips. But its significance is strategic: Huawei now controls its entire AI chip stack — logic processor and memory — free from sanctions risk.

Huawei AI Chip Roadmap

ChipTimelineFP4 ComputeFocus
Ascend 950PRQ1 2026 (shipping)2 petaFLOPS targetPrefill inference
Ascend 950DTQ4 20262 petaFLOPS targetDecode and training
Ascend 960Q4 2027To be determinedNext generation
Ascend 970Q4 20284 zettaFLOPS (cluster)Long-term target

At the system level, Huawei's CloudMatrix 384 connects 384 Ascend 910C NPUs across 16 racks to deliver approximately 300 petaFLOPS — roughly twice NVIDIA's GB200 NVL72, but at four times the power consumption. Huawei claims it outperforms NVIDIA on DeepSeek R1 inference.

China Market Impact

MetricValue
2025 Ascend 910 Shipments~700,000+ units
2026 Planned Production1.6 million dies (target)
China AI Chip Market Share (2026)~50% projected
NVIDIA China Share (2026)~8% projected (down from dominant position)
Manufacturing Yield5-20% (versus NVIDIA Blackwell at 60-80%)

Notable deployment: Zhipu AI trained GLM-5 (745 billion parameters) entirely on Huawei Ascend hardware using MindSpore — with zero NVIDIA dependency (February 2026).

📝Note

The 50% market share projection and 1.6 million production target may be aspirational. Low yield rates (5-20%) on SMIC's 7nm process and HBM supply constraints could significantly limit actual output. Analysts estimate China's 2026 HBM production can support only about 275,000 chips.

Software Ecosystem

  • MindSpore — Huawei's open-source deep learning framework with a translation layer that can ingest PyTorch and TensorFlow models
  • CANN — low-level programming environment for Ascend NPUs (Compute Architecture for Neural Networks)
  • PaddlePaddle integration — Baidu's framework deeply integrated with Ascend; Baidu and Huawei together control approximately 70% of China's GPU cloud market

Company Details

DetailInfo
CompanyHuawei Technologies Co., Ltd.
Founded1987
CEORen Zhengfei (founder)
HeadquartersShenzhen, Guangdong, China
Employees~207,000
Revenue (2024)$118 billion
AI Investment15 billion yuan ($2.1 billion) annually on AI ecosystem development
Sanctions StatusUS Entity List since 2019; global prohibitions on Ascend chip sales
Websitehuawei.com

Strengths

  • Most powerful Chinese-made AI chip — 1.56 petaFLOPS FP4 positions it as the domestic alternative to banned NVIDIA hardware
  • Full vertical integration — Huawei controls the entire stack: chip design, in-house HBM (HiBL), CloudMatrix systems, and MindSpore software
  • Sanctions-proof supply chain — no dependence on US technology, SK Hynix, Samsung, or TSMC for production
  • Massive domestic market — projected 50% of China's AI chip market in 2026 as NVIDIA share collapses
  • Ambitious roadmap — targeting 4 zettaFLOPS cluster performance by 2028

Limitations and Considerations

  • Memory bandwidth gap — HiBL 1.0 delivers 1.6 TB/s versus 4+ TB/s for HBM3e, limiting performance on memory-bound LLM inference
  • Low manufacturing yields — 5-20% yield on SMIC 7nm versus 60-80% for NVIDIA on TSMC, significantly increasing cost per chip
  • Process node frozen at 7nm — US sanctions prevent access to EUV lithography needed for 5nm and 3nm; each generation must compensate through architecture
  • Software ecosystem immaturity — MindSpore and CANN have much smaller developer communities than CUDA and PyTorch
  • Global sales restricted — US export controls effectively limit sales to China's domestic market; South Korea sales push may trigger diplomatic friction
  • CloudMatrix reliability — one analysis found a 68.2% failure rate for the 400G optical interconnects in the CloudMatrix 384 system

Key Takeaways

  • The Huawei Ascend 950PR is China's most powerful domestically produced AI accelerator — 1.56 petaFLOPS FP4 with in-house HiBL memory, built entirely outside the US-controlled supply chain
  • Claims 2.8 times the compute performance of NVIDIA's H20, but the H20 has 2.5 times more memory bandwidth — real-world results depend on the workload
  • Strategically significant: Huawei now controls its full AI chip stack (processor + memory + systems + software), projected to capture 50% of China's AI chip market in 2026
  • Constrained by 7nm process node, low yields (5-20%), and HBM supply limits — the roadmap to 4 zettaFLOPS by 2028 faces substantial manufacturing challenges

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you