Name: ZAYA1-8B
Availability: InStock
Author: Zyphra

Learning Objectives

Understand ZAYA1-8B's mixture-of-experts (MoE) architecture and why a 760 million active-parameter footprint matters for inference cost
Identify what's significant about training a frontier-quality math model entirely on AMD Instinct MI300X — the first such public claim
Evaluate when ZAYA1-8B is preferable to comparable open-weight reasoning models (DeepSeek-R1, Qwen reasoning variants)

What Is ZAYA1-8B?

ZAYA1-8B is an open-weight mixture-of-experts language model released by Zyphra, a San Francisco AI lab. The model has 8.4 billion total parameters but activates only 760 million parameters per inference token — roughly an order of magnitude smaller active footprint than its dense-model peers. On math benchmarks, ZAYA1-8B matches DeepSeek-R1, scoring 89.1 on AIME 2026 and 71.6 on HMMT, and stays competitive with Claude Sonnet 4.5 on broader reasoning despite the much smaller compute budget per token.

What sets ZAYA1-8B apart from the rest of the open-weight reasoning landscape is the training stack. The model was trained on a 1,024-node AMD Instinct MI300X cluster built with IBM, using AMD Pensando Pollara networking — making it the first frontier-quality math model trained entirely outside the NVIDIA CUDA stack. Combined with a custom mixture-of-experts design and a specialized attention mechanism that preserves reasoning quality at lower active-parameter budgets, the AMD-only training claim is a meaningful signal that hyperscaler-grade training no longer requires NVIDIA hardware.

The weights are released under Apache 2.0 on Hugging Face — the most permissive open-model license, with no restrictions on commercial use or competitive applications. Local deployment requires Zyphra's vLLM fork rather than standard vLLM, so plan for the runtime swap if you're integrating the model into existing inference pipelines.

✅Tip

Get ZAYA1-8B: huggingface.co/Zyphra — Apache 2.0 license, weights downloadable; serverless inference via Zyphra Cloud

Pricing and Access

Access Method	Cost	Best For
Hugging Face Download	Free (Apache 2.0)	Development, research, custom deployment with Zyphra's vLLM fork
Zyphra Cloud (serverless)	Usage-based	Quick API access without managing GPU infrastructure
Self-hosted	Free + your compute	Production deployment with control over inference stack

ZAYA1-8B is freely downloadable under Apache 2.0 — the most permissive open-source license, with no restrictions on commercial or competitive use. This is more permissive than the Gemma license (which restricts training competing AI services).

Core Capabilities

Frontier-Quality Math at 760 Million Active Parameters

Mathematical reasoning is the headline strength. On AIME 2026 ZAYA1-8B scores 89.1, matching DeepSeek-R1 — a much larger model. On HMMT the score is 71.6, also matching DeepSeek-R1. The achievement is doubly notable because most models in the 8 billion total / 760 million active range struggle on competition-grade math benchmarks; the mixture-of-experts routing concentrates reasoning capacity in the active subset of weights without spreading it thin across the full parameter pool.

For comparison, Zyphra reports ZAYA1-8B stays competitive with Claude Sonnet 4.5 on broader reasoning — meaningfully short of the frontier closed-API tier overall, but a strong showing for an order-of-magnitude smaller active-parameter budget.

Mixture-of-Experts Architecture

ZAYA1-8B uses a custom mixture-of-experts (MoE) design where most of the 8.4 billion total parameters sit dormant for any given inference token; the model routes each token to a small subset of "experts," activating only 760 million parameters per token. The architecture also includes a specialized attention mechanism designed to preserve reasoning quality at lower active-parameter budgets — Zyphra's published research describes this as an explicit trade against the standard practice of scaling dense models.

Training Stack — AMD MI300X Only, No NVIDIA

The most strategically significant claim. ZAYA1-8B was trained on a 1,024-node cluster of AMD Instinct MI300X GPUs, built with IBM, using AMD Pensando Pollara for cluster networking. The MI300X has 192 GB HBM3 per GPU — the largest unified-memory GPU in the AMD Instinct line — making it well-suited to the high-memory-pressure phases of MoE training.

The wider implication: AMD's Instinct line, paired with Pensando networking and a software stack that does not depend on CUDA, is now demonstrably capable of training frontier-quality reasoning models. This sits alongside Cerebras and Groq as evidence that the non-NVIDIA frontier-AI training market is maturing.

Local Deployment via Zyphra's vLLM Fork

For self-hosted deployment, ZAYA1-8B requires Zyphra's fork of vLLM rather than upstream vLLM — the MoE routing kernel and the custom attention mechanism need patches that have not yet landed in mainline vLLM. The fork is open-source on Zyphra's GitHub. Plan for this dependency if you are slotting ZAYA1-8B into an existing inference platform; serverless deployment via Zyphra Cloud avoids the runtime switch entirely.

Strengths

Frontier-quality math at small active size: Matches DeepSeek-R1 on AIME 2026 (89.1) and HMMT (71.6) with only 760 million active parameters per token
AMD-only training: First public claim of a frontier-quality math model trained entirely on AMD Instinct MI300X with Pensando Pollara networking — no NVIDIA dependency
Apache 2.0 license: Most permissive open-model license; commercial and competitive use unrestricted
Order-of-magnitude smaller active footprint: Inference cost scales with active parameters, so ZAYA1-8B runs much cheaper than dense models in the same quality tier
Available immediately: Weights on Hugging Face, serverless deployment via Zyphra Cloud, no waitlist

Limitations & Considerations

Custom runtime requirement: Self-hosted deployment requires Zyphra's vLLM fork, not upstream vLLM
Specialized strength profile: Math and reasoning lead; less strong on creative writing, broad multilingual coverage, and very long context
Smaller than frontier closed models on hardest tasks: Competitive with Claude Sonnet 4.5 on broader reasoning but not Claude Opus 4.7 or GPT-5.5 on the hardest tasks
New lab: Zyphra is a smaller AI startup; long-term operational and support track record is still being established
AMD-only training does not equal AMD-only inference: ZAYA1-8B inference runs on standard NVIDIA, AMD, or CPU hardware; the AMD-specific story is about the training pipeline only

Best Use Cases

Task	Why ZAYA1-8B
Math and competition-grade reasoning	AIME and HMMT scores match DeepSeek-R1 at much smaller active-parameter footprint
Cost-sensitive reasoning deployments	760 million active parameters per token keeps inference cost low at production scale
Open-source research and competitive products	Apache 2.0 license has no restrictions on competitive AI training or commercial deployment
AMD-stack deployments	The AMD Instinct training pedigree signals strong compatibility with non-NVIDIA inference stacks too
Self-hosted privacy-sensitive workflows	Open weights plus permissive license allow on-premise deployment without data egress

When to choose alternatives:

Maximum reasoning ceiling → Claude Opus 4.7, GPT-5.5, or DeepSeek V4-Pro for the hardest reasoning tasks
Multilingual breadth → Qwen3.5 (over 100 languages) or Gemma 4 (over 35 languages) for multilingual workloads
Standard inference runtime → Models that run on stock vLLM (Llama, Qwen, Gemma) for plug-and-play deployment
Smallest footprint at all costs → Phi-4 Mini or Gemma 4 E2B for the absolute smallest edge devices

Getting Started

Download weights from huggingface.co/Zyphra under Apache 2.0
Clone Zyphra's vLLM fork from GitHub for local deployment, or use Zyphra Cloud for serverless API access
Run a math-benchmark probe (a small subset of AIME or HMMT problems) before committing to broader integration — the model's strength profile is reasoning-heavy
For self-hosted production: budget for the runtime swap to Zyphra's vLLM fork; plan a rollback path if upstream-vLLM features you depend on are not yet patched in
For competitive workloads where Apache 2.0 matters, ZAYA1-8B is a stronger license fit than Gemma 4 (which restricts competing-service training)