Name: Broadcom Tomahawk 5
Availability: InStock
Author: Broadcom

Learning Objectives

Understand why high-throughput Ethernet switching is critical to AI training clusters
Identify Tomahawk 5's specs and how it compares to NVIDIA InfiniBand alternatives
Evaluate when to choose Ethernet-based AI fabrics vs. InfiniBand

What Is Broadcom Tomahawk 5?

Tomahawk 5 is the fifth-generation flagship Ethernet switch ASIC from Broadcom (BCM78900) — the silicon at the heart of the Ethernet switches connecting GPUs to GPUs in AI training clusters. It delivers 51.2 Tbps of total switching throughput on a single chip, doubling the prior generation, and is the first Broadcom switch ASIC to support 800 Gigabit Ethernet (800GbE).

For AI training, network throughput is often the bottleneck — large model training requires constant gradient and weight synchronization across hundreds or thousands of GPUs. Tomahawk 5's 51.2 Tbps throughput at low (~1 microsecond) latency, with shared-buffer architecture optimized for RoCEv2 (RDMA over Converged Ethernet) and other modern AI protocols, makes Ethernet a credible alternative to NVIDIA InfiniBand for the back-end AI fabric.

💡Key Concept

Why this matters for AI: A frontier-AI training run involves 10,000+ GPUs synchronized through a network. Every microsecond of network latency multiplied by every gradient step across every GPU compounds into hours of training time. Lower-latency, higher-throughput switches translate directly to faster training and lower cost per token. Tomahawk 5 is the silicon that makes Ethernet-based AI fabrics competitive with InfiniBand at the largest scales.

✅Tip

Visit Broadcom Tomahawk 5: broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm78900-series — silicon sold to switch manufacturers; deployed switches available from Arista, FS, NADDOD, and others

Pricing & Access

Tomahawk 5 is silicon sold to switch OEMs, not directly to data-center operators. End customers buy finished switches that incorporate the silicon.

Plan	Price	Features
Tomahawk 5 ASIC	Sold to switch OEMs	Not publicly priced Multi-year design wins Powers Arista, FS, NADDOD, others
64-port 800GbE switches	~$80,000 to $200,000 per switch	Configurable port modes (64x800GbE / 128x400GbE / 256x200GbE) 2U or 4U rack form factors Volume pricing varies by vendor
Optical modules	$1,000 to $5,000 per port	800G QSFP-DD or OSFP modules Significant share of total deployment cost Multi-mode and single-mode options
Tomahawk Ultra	Successor in late 2026	Even higher throughput AI/HPC scale-up focus Complements Tomahawk 5 for different topologies

Tomahawk 5 ASICSold to switch OEMs

Not publicly priced
Multi-year design wins
Powers Arista, FS, NADDOD, others

64-port 800GbE switches~$80,000 to $200,000 per switch

Configurable port modes (64x800GbE / 128x400GbE / 256x200GbE)
2U or 4U rack form factors
Volume pricing varies by vendor

Optical modules$1,000 to $5,000 per port

800G QSFP-DD or OSFP modules
Significant share of total deployment cost
Multi-mode and single-mode options

Tomahawk UltraSuccessor in late 2026

Even higher throughput
AI/HPC scale-up focus
Complements Tomahawk 5 for different topologies

A typical large AI training cluster spends as much on networking as on the GPUs themselves — Tomahawk 5 economics directly impact total cost of training.

Core Capabilities

51.2 Tbps Throughput

Industry-leading single-chip throughput. Configurable as 64x800GbE, 128x400GbE, or 256x200GbE — flexible deployment across leaf, spine, or aggregation roles. Built on TSMC 5nm process, with power consumption only ~10% higher than the 7nm Tomahawk 4 (around 500W per chip) despite doubling throughput.

Sub-Microsecond Latency

Approximately 1 microsecond cut-through latency (64-byte packets) — competitive with InfiniBand for AI workloads. Critical for AllReduce, AllGather, and other collective operations that dominate large-model training.

RoCEv2 Optimization

Industry's most advanced shared-buffer architecture, providing the lowest tail latency for RDMA over Converged Ethernet (RoCEv2). RDMA lets GPUs access remote GPU memory without involving the CPU — the foundation of efficient distributed AI training. Tomahawk 5's buffer architecture means RDMA performance stays consistent under congestion, where lesser silicon stalls.

800GbE Support

First Broadcom switch ASIC to support 800 Gigabit Ethernet ports. 800GbE is the new standard for AI back-end fabrics — most NVIDIA H100/H200/B200 cluster designs assume 800GbE per GPU NIC. Tomahawk 5 deployments scale from 64-GPU pods to 10,000+-GPU clusters.

Telemetry and Congestion Control

Hardware-level telemetry feeds congestion-control algorithms. Visibility into queue depths, packet timing, and ECN (Explicit Congestion Notification) signals lets operators tune AI fabrics for sustained throughput rather than peak-only performance.

Strengths

51.2 Tbps single-chip throughput: Industry-leading — doubles prior generation
Mature ecosystem: Multiple switch OEMs (Arista, FS, NADDOD, Cisco) ship products built on Tomahawk 5 — buyers have vendor choice
800GbE first-mover: Enables next-generation AI cluster deployments without waiting for the next switch silicon generation
RoCEv2-optimized: Best-in-class shared-buffer architecture for RDMA workloads
Power efficiency: Only ~10 percent power increase over Tomahawk 4 despite doubled throughput
Ethernet ecosystem: Open standards, multi-vendor optics + transceivers, broad interoperability

Limitations & Considerations

Switch silicon, not solutions: Buyers need switch OEMs (Arista, FS, NADDOD) to ship finished products — Broadcom doesn't sell direct
Optics cost: 800GbE optical modules can cost $1,000-$5,000 per port; cable + optics costs frequently exceed switch silicon costs
NVIDIA InfiniBand comparison: For tightly coupled training workloads at the largest scale, NVIDIA Quantum InfiniBand still offers some workload-specific advantages — Ethernet is closing the gap, not yet universally winning
Software stack maturity: RoCEv2 software stacks have matured rapidly but still require expert tuning for production AI deployments
Tomahawk Ultra coming: Late 2026 successor will offer higher throughput and tighter AI/HPC scale-up integration — long-deployment buyers should track the roadmap

Best Use Cases

Use Case	Why Tomahawk 5 Fits	Caveat
AI training clusters (1000+ GPUs)	51.2 Tbps + RoCEv2 + 800GbE built for this scale	Total cluster cost includes optics + cables + tuning
Mixed AI inference + training fabrics	Configurable port modes (800/400/200GbE)	Need network design expertise to tune properly
Hyperscale data center fabrics	Multi-vendor OEM support means competitive pricing	Switch software stack maturity matters as much as silicon
Greenfield AI data centers	Tomahawk 5 + 800GbE optics is a current-generation reference design	Tomahawk Ultra in late 2026 will succeed it
Migration from InfiniBand	RoCEv2 + RDMA performance approaches IB	Software porting effort can be substantial

When to choose alternatives:

Tightly coupled HPC + AI training at the very largest scale → NVIDIA Quantum InfiniBand still offers workload-specific benefits
Smaller deployments (under 100 GPUs) → lower-throughput Tomahawk 4 / 3 silicon may be cost-effective
Specialized AI fabrics → some hyperscalers (Google TPU pods, Cerebras wafer-scale) use proprietary interconnects rather than commodity Ethernet

Key Takeaways

Broadcom Tomahawk 5 (BCM78900) is the dominant 51.2 Tbps Ethernet switch silicon for AI data center back-end fabrics
Supports 800GbE — the new standard for AI cluster networking — across configurable port modes (64x800GbE, 128x400GbE, 256x200GbE)
RoCEv2-optimized shared-buffer architecture delivers RDMA performance approaching NVIDIA InfiniBand at meaningfully lower cost
Sold as silicon to switch OEMs (Arista, FS, NADDOD, others); end customers buy finished switches typically priced $80,000-$200,000 each
Tomahawk Ultra successor in late 2026 will offer higher throughput; long-deployment buyers should track the roadmap

Broadcom Tomahawk 5

Audio & video lessons are paid features