Learning Objectives
- Understand why high-throughput Ethernet switching is critical to AI training clusters
- Identify Tomahawk 5's specs and how it compares to NVIDIA InfiniBand alternatives
- Evaluate when to choose Ethernet-based AI fabrics vs. InfiniBand
What Is Broadcom Tomahawk 5?
Tomahawk 5 is the fifth-generation flagship Ethernet switch ASIC from Broadcom (BCM78900) — the silicon at the heart of the Ethernet switches connecting GPUs to GPUs in AI training clusters. It delivers 51.2 Tbps of total switching throughput on a single chip, doubling the prior generation, and is the first Broadcom switch ASIC to support 800 Gigabit Ethernet (800GbE).
For AI training, network throughput is often the bottleneck — large model training requires constant gradient and weight synchronization across hundreds or thousands of GPUs. Tomahawk 5's 51.2 Tbps throughput at low (~1 microsecond) latency, with shared-buffer architecture optimized for RoCEv2 (RDMA over Converged Ethernet) and other modern AI protocols, makes Ethernet a credible alternative to NVIDIA InfiniBand for the back-end AI fabric.
💡Key Concept
Why this matters for AI: A frontier-AI training run involves 10,000+ GPUs synchronized through a network. Every microsecond of network latency multiplied by every gradient step across every GPU compounds into hours of training time. Lower-latency, higher-throughput switches translate directly to faster training and lower cost per token. Tomahawk 5 is the silicon that makes Ethernet-based AI fabrics competitive with InfiniBand at the largest scales.
✅Tip
Visit Broadcom Tomahawk 5: broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm78900-series — silicon sold to switch manufacturers; deployed switches available from Arista, FS, NADDOD, and others
Pricing & Access
Tomahawk 5 is silicon sold to switch OEMs, not directly to data-center operators. End customers buy finished switches that incorporate the silicon.
- Not publicly priced
- Multi-year design wins
- Powers Arista, FS, NADDOD, others
- Configurable port modes (64x800GbE / 128x400GbE / 256x200GbE)
- 2U or 4U rack form factors
- Volume pricing varies by vendor
- 800G QSFP-DD or OSFP modules
- Significant share of total deployment cost
- Multi-mode and single-mode options
- Even higher throughput
- AI/HPC scale-up focus
- Complements Tomahawk 5 for different topologies
A typical large AI training cluster spends as much on networking as on the GPUs themselves — Tomahawk 5 economics directly impact total cost of training.
Core Capabilities
51.2 Tbps Throughput
Industry-leading single-chip throughput. Configurable as 64x800GbE, 128x400GbE, or 256x200GbE — flexible deployment across leaf, spine, or aggregation roles. Built on TSMC 5nm process, with power consumption only ~10% higher than the 7nm Tomahawk 4 (around 500W per chip) despite doubling throughput.
Sub-Microsecond Latency
Approximately 1 microsecond cut-through latency (64-byte packets) — competitive with InfiniBand for AI workloads. Critical for AllReduce, AllGather, and other collective operations that dominate large-model training.
RoCEv2 Optimization
Industry's most advanced shared-buffer architecture, providing the lowest tail latency for RDMA over Converged Ethernet (RoCEv2). RDMA lets GPUs access remote GPU memory without involving the CPU — the foundation of efficient distributed AI training. Tomahawk 5's buffer architecture means RDMA performance stays consistent under congestion, where lesser silicon stalls.
800GbE Support
First Broadcom switch ASIC to support 800 Gigabit Ethernet ports. 800GbE is the new standard for AI back-end fabrics — most NVIDIA H100/H200/B200 cluster designs assume 800GbE per GPU NIC. Tomahawk 5 deployments scale from 64-GPU pods to 10,000+-GPU clusters.
Telemetry and Congestion Control
Hardware-level telemetry feeds congestion-control algorithms. Visibility into queue depths, packet timing, and ECN (Explicit Congestion Notification) signals lets operators tune AI fabrics for sustained throughput rather than peak-only performance.
Strengths
- 51.2 Tbps single-chip throughput: Industry-leading — doubles prior generation
- Mature ecosystem: Multiple switch OEMs (Arista, FS, NADDOD, Cisco) ship products built on Tomahawk 5 — buyers have vendor choice
- 800GbE first-mover: Enables next-generation AI cluster deployments without waiting for the next switch silicon generation
- RoCEv2-optimized: Best-in-class shared-buffer architecture for RDMA workloads
- Power efficiency: Only ~10 percent power increase over Tomahawk 4 despite doubled throughput
- Ethernet ecosystem: Open standards, multi-vendor optics + transceivers, broad interoperability
Limitations & Considerations
- Switch silicon, not solutions: Buyers need switch OEMs (Arista, FS, NADDOD) to ship finished products — Broadcom doesn't sell direct
- Optics cost: 800GbE optical modules can cost $1,000-$5,000 per port; cable + optics costs frequently exceed switch silicon costs
- NVIDIA InfiniBand comparison: For tightly coupled training workloads at the largest scale, NVIDIA Quantum InfiniBand still offers some workload-specific advantages — Ethernet is closing the gap, not yet universally winning
- Software stack maturity: RoCEv2 software stacks have matured rapidly but still require expert tuning for production AI deployments
- Tomahawk Ultra coming: Late 2026 successor will offer higher throughput and tighter AI/HPC scale-up integration — long-deployment buyers should track the roadmap
Best Use Cases
| Use Case | Why Tomahawk 5 Fits | Caveat |
|---|---|---|
| AI training clusters (1000+ GPUs) | 51.2 Tbps + RoCEv2 + 800GbE built for this scale | Total cluster cost includes optics + cables + tuning |
| Mixed AI inference + training fabrics | Configurable port modes (800/400/200GbE) | Need network design expertise to tune properly |
| Hyperscale data center fabrics | Multi-vendor OEM support means competitive pricing | Switch software stack maturity matters as much as silicon |
| Greenfield AI data centers | Tomahawk 5 + 800GbE optics is a current-generation reference design | Tomahawk Ultra in late 2026 will succeed it |
| Migration from InfiniBand | RoCEv2 + RDMA performance approaches IB | Software porting effort can be substantial |
When to choose alternatives:
- Tightly coupled HPC + AI training at the very largest scale → NVIDIA Quantum InfiniBand still offers workload-specific benefits
- Smaller deployments (under 100 GPUs) → lower-throughput Tomahawk 4 / 3 silicon may be cost-effective
- Specialized AI fabrics → some hyperscalers (Google TPU pods, Cerebras wafer-scale) use proprietary interconnects rather than commodity Ethernet
Key Takeaways
- Broadcom Tomahawk 5 (BCM78900) is the dominant 51.2 Tbps Ethernet switch silicon for AI data center back-end fabrics
- Supports 800GbE — the new standard for AI cluster networking — across configurable port modes (64x800GbE, 128x400GbE, 256x200GbE)
- RoCEv2-optimized shared-buffer architecture delivers RDMA performance approaching NVIDIA InfiniBand at meaningfully lower cost
- Sold as silicon to switch OEMs (Arista, FS, NADDOD, others); end customers buy finished switches typically priced $80,000-$200,000 each
- Tomahawk Ultra successor in late 2026 will offer higher throughput; long-deployment buyers should track the roadmap