Learning Objectives
- Explain the energy, GPU, memory, and cooling constraints that limit AI data center scaling
- Describe the geopolitical dimensions of AI chip supply and export controls
- Summarize near-term and long-term infrastructure trajectories, including nuclear and space-based solutions
Why Infrastructure Matters for Developers
Most developers call an API and think nothing of what happens on the other side. But the physical infrastructure that makes those API calls possible is now the primary constraint on the rate of AI progress — more than algorithmic breakthroughs or research investment.
Understanding the infrastructure landscape helps explain: why frontier model inference costs what it does, why GPU availability affects development timelines, why AI companies are signing nuclear power deals, and where the bottlenecks are likely to ease (or tighten) in the coming years.
The Energy Problem
A single large-scale AI training run for a frontier model consumes roughly the same electricity as a small city uses in a year. A hyperscaler's AI data center cluster — the kind needed to train GPT-5 or Gemini 3 — consumes 1-2 gigawatts of power continuously.
Context: 1 GW is approximately the output of a large nuclear power plant. Global data center power consumption is reaching 96 GW by 2026 — nearly doubled from 2023. US data centers now consume approximately 6% of total US electricity. AI drove the majority of that growth.
Power purchase agreements (PPAs): The hyperscalers — Google, Microsoft, Amazon — have all signed nuclear power purchase agreements in the last two years. Microsoft signed a deal to restart the Three Mile Island nuclear plant in Pennsylvania specifically for AI compute needs. Google has agreements with multiple nuclear operators. Amazon has partnered with Dominion Energy for nuclear capacity.
These aren't ideological choices — they're engineering ones. AI data centers need 24/7 reliable power that doesn't depend on weather. Solar and wind require storage. Nuclear runs continuously.
The Stargate Project: The largest AI infrastructure initiative to date, a joint venture between OpenAI, SoftBank, Oracle, and others targeting $400 billion+ in total investment. The flagship Abilene, Texas facility is operational (~1 GW by mid-2026), with 5 additional sites announced (7 GW total capacity). However, the Texas expansion was scrapped due to financing disputes — illustrating the execution challenges of mega-scale data center projects. OpenAI is also developing a custom "Titan" chip with Broadcom (TSMC 3nm, expected H2 2026) to reduce NVIDIA dependency.
Small Modular Reactors (SMRs): SMRs are factory-manufactured nuclear reactors at 1/10 the scale of traditional plants — designed to be sited adjacent to data centers. Progress has accelerated: NuScale signed a 6 GW deal with the Tennessee Valley Authority (TVA). Oklo broke ground at Idaho National Laboratory and signed a 12 GW deal with Switch. X-energy is under an 18-month NRC review for design certification. First commercial SMR-powered data centers are expected 2027-2030. Microsoft, Google, and Amazon all have SMR agreements signed.
The GPU Supply Chain
NVIDIA controls approximately 80% of AI training compute capacity, primarily through the H100, H200, and GB200 Blackwell GPU families. This concentration creates supply chain risks that have already affected company timelines.
Lead times: Large GPU orders (10,000+ GPUs) have required 6-12+ months of lead time at peak demand. Major AI companies have paid significant premiums for priority allocation.
CUDA moat: NVIDIA's competitive advantage is not only hardware — it's the CUDA programming ecosystem built over 20 years. The vast majority of AI frameworks, optimization tools, and model architectures are CUDA-optimized. Switching to AMD hardware requires re-optimization work that many teams have not done.
AMD MI300X / MI350: AMD has made the most credible challenge to NVIDIA dominance. The MI300X has 192GB of HBM3 memory — significantly more than the H100's 80GB — making it advantageous for very large model inference. The MI350 competes directly with the H200. AMD's ROCm software stack has improved dramatically, though it still trails CUDA in ecosystem depth.
Custom ASICs: The hyperscalers are reducing NVIDIA dependency for their own workloads. Google's TPUs handle Gemini training and inference. Amazon's Trainium3 runs a growing percentage of AWS customer inference. Microsoft's Maia 200 handles Copilot workloads. This internal substitution limits hyperscaler demand for NVIDIA GPUs, but doesn't affect the broader AI ecosystem.
The Memory Bottleneck
Even with sufficient GPUs, a supply chain constraint on High Bandwidth Memory (HBM) limits GPU production:
The HBM situation: HBM is specialized DRAM stacked on the GPU package. Only three companies produce it: SK Hynix (~50% market share), Samsung (~30%), and Micron (~20%). The manufacturing process is extremely complex and yields are low.
Why it matters: NVIDIA's H100 requires 5 HBM3 stacks per GPU. At peak demand, HBM supply has constrained NVIDIA's ability to produce H100s even when the GPU die was available. The HBM supply chain is a chokepoint for the entire AI hardware industry.
HBM4 entering production: HBM3e is the current standard (used in H200 and MI300X). SK Hynix began HBM4 production in February 2026; Samsung is ramping Q2 2026. Both are delivering samples to NVIDIA for the Vera Rubin architecture. Each generation roughly doubles bandwidth and capacity — HBM4 enables the next generation of GPU architectures.
Cooling Systems
Modern AI chips run extremely hot. The H100 SXM module dissipates approximately 700W per GPU; a rack of 8 H100s dissipates 5,600W — more than a typical home's total electrical consumption.
Air cooling limits: Traditional data center air cooling can handle roughly 20-30kW per rack. Modern AI racks push 50-100kW per rack, requiring fundamentally different cooling approaches.
Direct Liquid Cooling (DLC): Pipes coolant directly to chip heat spreaders. Removes heat 3-5x more efficiently than air cooling. Now standard for dense AI deployments.
Immersion cooling: Entire server boards submerged in dielectric (non-conductive) fluid. Removes heat directly from all components. Enables extremely high density — 300+ kW per rack is achievable. Being deployed in the most cutting-edge AI data centers.
The retrofit problem: Existing data centers built for air-cooled IT equipment cannot simply add liquid cooling without significant infrastructure changes. The need to retrofit or replace existing data center inventory is a significant capital cost for hyperscalers.
Geopolitics: Export Controls
AI chip capability has become a geopolitical resource, and governments are acting accordingly.
US Bureau of Industry and Security (BIS) export controls: The US government has restricted the export of advanced AI chips (NVIDIA H100 and above, AMD MI300X and above) to China, Russia, and a growing list of designated countries. The restrictions apply not just to direct sales but to cloud access from those countries.
This has several effects:
- Chinese AI companies cannot directly purchase the most advanced NVIDIA training hardware
- It has accelerated Chinese domestic chip development (Biren, Cambricon, Huawei Ascend)
- It has created a gray market for smuggled chips in some regions
- It has bifurcated the global AI ecosystem into US-aligned and China-domestic capability tracks
For developers: this affects which models are accessible from which countries, and which data center regions certain AI services can be hosted in.
The Future of AI Infrastructure (2026-2035)
Near-term (2026-2028):
- Nuclear SMR deals accelerate; first commercial SMR deployments adjacent to data centers
- HBM4 production ramps; memory bottleneck eases somewhat
- Liquid cooling becomes standard for new AI data center construction
- NVIDIA maintains dominance but AMD gains share in inference
Medium-term (2028-2035):
- Space-based solar power: the Japanese Aerospace Exploration Agency (JAXA) and European Space Agency (ESA) have active programs. Theoretical advantage: continuous 24/7 solar power from orbit, beamed to Earth as microwave. Demonstration projects targeting 2030-2035; commercial viability in the 2035-2040 range.
- Quantum computing: theoretical advantages for certain ML workloads (optimization, simulation of quantum systems) remain genuinely uncertain. Current quantum computers are not useful for ML training. A 2030 timeline for practically useful quantum ML is aggressive but held by some researchers.
Cost trajectory: The most reliable prediction is continued cost reduction. Inference costs have fallen roughly 10x per year for equivalent capability. The frontier-class AI that costs dollars per query today will likely cost fractions of a cent within 3-5 years, following the semiconductor industry's historical learning curve.
Key Takeaways
- Energy is the hard physical constraint on AI progress — data centers consuming gigawatts have driven hyperscalers to nuclear power agreements and SMR investments
- NVIDIA's GPU supply dominance (with the CUDA ecosystem as the software moat) and the HBM memory bottleneck are the two primary hardware supply chain constraints
- US export controls on advanced AI chips have bifurcated the global AI hardware ecosystem and accelerated Chinese domestic chip investment
- Infrastructure costs follow a steep decline curve — the most reliable long-term prediction is that frontier AI inference costs will be 100x lower within 5 years, enabling applications currently too expensive to be practical