Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
5 min read·Updated May 4, 2026

AMD ROCm

AMD logoBy AMD

AMD ROCm is the company's open, permissively licensed AI compute software stack — the deliberate counterweight to NVIDIA's CUDA. ROCm 7.2.2 ships with vLLM 0.10.1, native PyTorch on Windows + Linux, and a vLLM CI pass rate that jumped from 37 percent to 93 percent across late 2025.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand what ROCm is and how it relates to NVIDIA's CUDA in the AI compute software stack
  • Evaluate ROCm's 2026 maturity on the LLM-inference path versus the long tail of research code
  • Recognize the strategic importance of AMD's open-source positioning for hyperscalers and on-prem customers

What Is AMD ROCm?

ROCm (Radeon Open Compute) is AMD's open-source software stack for GPU compute — the layer between the operating system and frameworks like PyTorch, TensorFlow, and vLLM. It is AMD's deliberate counterweight to NVIDIA's CUDA, the proprietary software platform that is the single biggest reason most AI code runs on NVIDIA hardware today.

ROCm includes the HIP programming model (a CUDA-compatible C++ API), GPU drivers, math libraries (rocBLAS, rocFFT, MIOpen), the compiler toolchain, and integrations with major frameworks. It runs on AMD Instinct datacenter accelerators and a growing list of Radeon consumer GPUs — a deliberate openness contrast to CUDA, which still locks many features to NVIDIA datacenter SKUs.

💡Key Concept

The CUDA moat. CUDA has been NVIDIA's strongest competitive advantage for over fifteen years. Decades of accumulated CUDA code in research labs, MLPerf submissions, framework reference kernels, and proprietary enterprise stacks all assume NVIDIA hardware. Any challenger needs not just competitive silicon (which AMD has with Instinct) but a software stack that reduces the migration cost from CUDA to near zero. ROCm is that stack.

ROCm 7.2 — Current State (May 2026)

The current generally-available release is ROCm 7.2.2. The 7.x line shipped through 2025-2026 was the maturation series — each minor release added concrete framework support that previously required custom builds.

ReleaseDateHeadline addition
ROCm 7.0Q3 2025PyTorch 2.7 native support; rocSHMEM general availability
ROCm 7.1.1November 2025PyTorch 2.9 native; vLLM 0.10.1 bundled
ROCm 7.2.2Early 2026RDNA 7000 / 9000 consumer-GPU support added

Tip

Visit ROCm: rocm.docs.amd.com for documentation; github.com/ROCm for the open-source repositories.

The vLLM CI Pass-Rate Story

The single most-cited proof point for ROCm's 2026 maturation is the vLLM CI pass rate on AMD Instinct. vLLM is the dominant open-source LLM-inference engine; its continuous-integration test suite is the de-facto compatibility benchmark for any AI accelerator that wants to run modern open-weight models in production.

  • November 2025: AMD CI pass rate sat at roughly 37 percent of vLLM tests.
  • January 2026: That number jumped to 93 percent, after AMD shipped a dedicated vLLM CI pipeline on December 29, 2025.

For inference-focused workloads in 2026 — the bulk of what hyperscalers run today — that 56-percentage-point swing is the reason Oracle, Microsoft Azure, and OpenAI all moved from "evaluating" to "committed to" Instinct deployments inside a single quarter.

Framework Support

FrameworkROCm support state (May 2026)
PyTorchNative; Windows + Linux as public preview (RDNA 7000 / 9000)
vLLM0.10.1 bundled; dedicated CI pipeline; 93% pass rate as of Jan 2026
TensorFlowNative; less prioritized than PyTorch but maintained
JAXSupported via ROCm builds; smaller community than CUDA-side
llama.cppWavefront-64 patches upstreamed July 2025; runs on Radeon consumer GPUs
DeepSpeed / MegatronFunctional; CUDA versions remain the reference implementations

The PyTorch on Windows preview matters for a non-obvious reason: it lets developers using consumer Radeon GPUs (RX 7000 and RX 9000 series) run real AI workloads on a Windows machine without dual-booting Linux — closing a developer-experience gap that CUDA never had.

Pricing

ROCm (software)Free / Open Source
  • Apache 2.0 + MIT licensing
  • Source on GitHub
  • No commercial license fee
Hardware (Instinct)Enterprise quote
  • Datacenter accelerators
  • Sold via OEM channel
  • Hyperscaler deals
Hardware (Radeon)Retail
  • Consumer GPUs
  • RX 7000 / 9000 supported
  • For developer workstations
Enterprise supportCustom contract
  • 24x7 support, hot-fix SLAs
  • Available via AMD enterprise
  • Or via OEM (Dell, HPE, Lenovo)

ROCm itself is free and permissively licensed. The cost lives in the AMD silicon underneath it.

Strengths

  • Permissive open-source license — Apache 2.0 and MIT for the bulk of the stack; full source on GitHub. Hyperscalers and air-gapped enterprise customers can audit and customize.
  • Consumer-GPU support — Radeon RX 7000 and 9000 series GPUs run real AI workloads, including PyTorch and llama.cpp; CUDA still gates many features to NVIDIA datacenter SKUs.
  • vLLM-tier inference path is mature — 93 percent CI pass rate as of January 2026 means most modern open-weight LLMs run cleanly on Instinct via stock vLLM.
  • Cross-platform development — Native PyTorch on both Windows and Linux closes a long-standing developer-experience gap.
  • HIP CUDA-compatibility layer — Existing CUDA code can often be ported to AMD with minimal rewrites via the HIPify tooling, lowering the migration cost from NVIDIA.

Limitations and Considerations

  • Long tail of research code still favors CUDA — Custom kernels, vendor-specific libraries, and bleeding-edge research repos typically have CUDA as the reference; ROCm parity arrives weeks or months later.
  • Profiler and debugger tooling — rocprof and roctracer are functional, but Nsight Systems and Nsight Compute on the NVIDIA side remain the gold standard.
  • Small AMD community relative to NVIDIA — Stack Overflow answers, internal hyperscaler tooling, and the long tail of GitHub examples skew CUDA-heavy.
  • Per-release breakage risk — ROCm minor releases occasionally introduce framework-version-pinning issues; production deployments often pin to a known-good ROCm + PyTorch + vLLM combination rather than tracking latest.
  • Training-side parity is behind inference — DeepSpeed and Megatron-LM run, but the CUDA reference implementations are what most papers benchmark against and what most hyperscaler training stacks are built on.

Key Takeaways

  • ROCm is AMD's open-source AI compute software stack — the deliberate counterweight to NVIDIA's CUDA moat, with permissive licensing and consumer-GPU support that CUDA does not match
  • The current release is ROCm 7.2.2, shipping with vLLM 0.10.1 bundled and native PyTorch support on both Windows and Linux for RDNA 7000 and 9000 consumer GPUs
  • The vLLM CI pass rate jump from 37 percent (November 2025) to 93 percent (January 2026) is the most-cited proof point that ROCm has effectively closed the gap on the LLM-inference path
  • The long tail of research code, custom kernels, and training-side toolchains still favors CUDA — ROCm's 2026 win is inference, not yet the entire workload spectrum

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you