Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
7 min read·Updated April 29, 2026

Tenstorrent Wormhole

Tenstorrent logoBy Tenstorrent

Tenstorrent Wormhole is the RISC-V AI accelerator from Jim Keller's chip design startup — open-source hardware architecture as a deliberate alternative to NVIDIA's closed CUDA ecosystem, with chiplet-based scaling and successor Blackhole already shipping.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand Tenstorrent's RISC-V + open-source-hardware strategy and how it differs from NVIDIA
  • Identify Wormhole and Blackhole specs, customer adoption, and pricing
  • Evaluate when Tenstorrent makes sense vs. NVIDIA, Intel Gaudi, or other AI accelerators

What Is Tenstorrent Wormhole?

Tenstorrent Wormhole is the AI accelerator chip from Tenstorrent, founded by Jim Keller — the legendary chip architect behind AMD Zen, Apple A-series, Tesla Full Self-Driving, and Intel's resurgence efforts. Tenstorrent's strategy is a deliberate inverse of NVIDIA: open ISA (RISC-V), open-source compiler stack, explicit-data-movement architecture (rather than hidden hardware caches), and chiplet-based modular scaling.

Wormhole ships in two PCIe card configurations — Wormhole n150 (single processor, 72 Tensix cores, 5 RISC-V cores per Tensix) and Wormhole n300 (dual-processor variant). Successor Blackhole is now shipping with 16 dedicated big RISC-V cores that handle data orchestration without a separate host CPU — eliminating roundtrips that bottleneck conventional GPU clusters.

💡Key Concept

Jim Keller's design philosophy: "Whatever NVIDIA does, we'll do the opposite." That means open ISA instead of CUDA's closed ecosystem, explicit data movement instead of cache hierarchies, RISC-V instead of proprietary architectures, and chiplets instead of monolithic dies. The bet: long-term, open architectures will out-evolve closed ones. Short-term, NVIDIA's ecosystem advantage is enormous, but there's room for a credible alternative — especially for customers who want IP they can license, customize, and own.

Tip

Visit Tenstorrent: tenstorrent.com — workstations from $12,000; chip IP licensing for enterprise customers

Pricing

Tenstorrent sells both finished hardware (PCIe cards, workstations) and IP licenses (RISC-V cores + Tensix architecture for customers building their own silicon).

Wormhole n150 PCIe cardPre-order pricing
  • 72 Tensix cores per processor
  • 5 RISC-V cores per Tensix
  • Single-card AI dev kit
Wormhole n300 PCIe cardHigher-tier dual-processor variant
  • 144 Tensix cores total
  • Higher memory and bandwidth
  • Production deployment focus
TT-LoudBox / TT-QuietBox workstationsFrom $12,000
  • Pre-built 4x n300 systems
  • Includes thermal solution
  • Most accessible Tenstorrent platform
Blackhole p100 cardSuccessor architecture
  • 16 big RISC-V cores for data orchestration
  • Eliminates host-CPU roundtrip
  • Currently shipping
Chip IP LicensingCustom enterprise pricing
  • RISC-V core IP and Tensix architecture
  • Used by LG, BOS Semiconductor, Rapidus
  • Build custom silicon

The TT-LoudBox / TT-QuietBox workstations at $12,000 are the most accessible on-ramp for AI developers wanting hands-on Tenstorrent experience.

Core Architecture

Tensix Cores + RISC-V

Each Tensix core contains 5 RISC-V cores alongside specialized matrix-math units — combining general-purpose programmability with tensor-acceleration hardware. Wormhole n150 packs 72 Tensix cores (so 360 RISC-V cores plus matrix engines per chip). Compute scales by adding more Tensix cores; memory and connectivity scale through chiplet packaging.

Explicit Data Movement

Unlike GPUs that rely on hardware cache hierarchies to hide memory latency, Tenstorrent chips require the compiler (or programmer) to explicitly move data between cores and memory tiers. The trade-off: harder for the programmer or compiler, but no wasted cache fills, no cache eviction surprises, and predictable performance. The TT-Buda and TT-Metalium compiler toolchains handle this automatically for mainstream PyTorch/TensorFlow workloads.

Open-Source Software Stack

TT-Buda (high-level PyTorch/TensorFlow integration) and TT-Metalium (low-level metal programming) are both open-source — every layer of the stack from silicon to user code is open and inspectable. CUDA, by contrast, is closed at every layer below the API.

Chiplet-Based Scaling

Tenstorrent's silicon is designed for chiplet-based packaging — combine multiple smaller dies into larger systems flexibly. Trades absolute peak performance for manufacturing flexibility, lower yield costs, and customer customization options.

Blackhole Successor (Shipping)

Blackhole is Wormhole's successor. Key new feature: 16 big RISC-V cores per chip dedicated to data orchestration. In conventional GPU clusters, the host CPU manages data movement between accelerators — adding latency. Blackhole eliminates this by handling its own data orchestration in-silicon.

IP Licensing Business

Beyond selling finished chips, Tenstorrent licenses RISC-V CPU IP and Tensix architecture IP to other chipmakers. Disclosed customers include LG (powered on test silicon), BOS Semiconductor (automotive silicon), and Japanese foundry Rapidus (working with Tenstorrent on 2nm pilot line). This dual model — sell chips and license IP — mirrors ARM's approach.

Strengths

  • Open-source through the stack: ISA (RISC-V), compiler (TT-Metalium, TT-Buda), and reference designs all open — counter-positioned to NVIDIA's closed ecosystem
  • Jim Keller credibility: Founder + lead architect with track record at AMD, Apple, Tesla, Intel — the most credible NVIDIA challenger from a silicon-design perspective
  • Workstation-tier pricing: $12,000 starting workstations make Tenstorrent the most accessible "alternative AI accelerator" for individual developers
  • Blackhole self-orchestration: 16 big RISC-V cores eliminate host-CPU roundtrip — meaningful for distributed training efficiency
  • Dual revenue model: Chip sales plus IP licensing diversifies Tenstorrent's exposure beyond commodity silicon
  • Chiplet roadmap: Modular scaling lets Tenstorrent serve a wider range of price-performance tiers than monolithic-die competitors

Limitations & Considerations

  • Software ecosystem young: PyTorch/TensorFlow integration via TT-Buda is solid but ecosystem maturity (libraries, debugging tools, optimization recipes) trails NVIDIA CUDA by years
  • Customer base small: LG, BOS Semiconductor, Rapidus, plus individual workstation buyers — far smaller than NVIDIA / AMD reference customer lists
  • Performance vs. NVIDIA flagship: Wormhole and Blackhole compete on price-performance and openness, not absolute peak performance — frontier-AI training labs continue to favor NVIDIA flagships
  • Tooling friction: Explicit-data-movement architecture is harder to optimize than cache-based GPUs; compiler quality matters enormously and is still maturing
  • Roadmap uncertainty: Successor generations after Blackhole are publicly fuzzy; enterprise customers must weigh long-term commitment against this uncertainty
  • No high-level managed services: Tenstorrent sells silicon and software stacks; no equivalent to AWS Bedrock or NVIDIA Inference Microservices

Best Use Cases

Use CaseWhy Tenstorrent FitsCaveat
AI developers exploring NVIDIA alternatives$12,000 workstations make hands-on access affordableSoftware stack maturity matters for production work
Open-source AI infrastructure projectsOpen ISA + compiler enables full inspection and customizationTrade-off: less commercial polish than CUDA
Custom AI silicon designRISC-V + Tensix IP licensing supports custom chip projectsMulti-year project commitment with foundry partner
Cost-sensitive AI inferencePrice-performance positioning vs NVIDIA flagshipsMatch workload to current tooling capability
Research on novel AI architecturesOpen stack + explicit data movement enables architectural researchPerformance optimization requires expertise in TT-Metalium

When to choose alternatives:

  • Frontier AI training at the largest scale → NVIDIA H200 / B200 for ecosystem maturity
  • Cost-driven enterprise inference → Intel Gaudi 3 also offers value pricing with broader software-stack coverage
  • Mainstream production AI → CUDA ecosystem still dominates — pick NVIDIA unless the openness benefits outweigh the maturity gap
  • Highest absolute peak performance → NVIDIA Blackwell or AMD MI300X
  • Edge inference at global scale → Cloudflare Workers AI or hyperscaler edge services

Key Takeaways

  • Tenstorrent Wormhole is the RISC-V AI accelerator from Jim Keller's chip design startup — deliberate inverse of NVIDIA's closed CUDA ecosystem
  • Architecture: Tensix cores combine RISC-V cores with matrix-math units; Wormhole n150 has 72 Tensix cores; Blackhole successor adds 16 big RISC-V cores for self-orchestration
  • TT-LoudBox / TT-QuietBox workstations at $12,000 are the most accessible on-ramp; chip IP licensed to LG, BOS Semiconductor, Rapidus, and others
  • Open-source through the stack: ISA (RISC-V), compiler (TT-Metalium, TT-Buda), and reference designs — counter-positioned to NVIDIA's closed model
  • Best fit for developers exploring open AI hardware, custom-silicon teams licensing IP, and research on novel AI architectures; software-stack maturity still trails NVIDIA for mainstream production

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you