9.8 — Cloud Hyperscalers | AI Pro Playbook

Learning Objectives

Compare AWS, Azure, and Google Cloud as AI development platforms for application builders
Explain the key AI services offered by each hyperscaler and when to use them
Apply a decision framework for choosing between cloud providers for AI application development

Why Hyperscalers Matter for AI Developers

For most developers building AI applications, the API layer (Claude, GPT, Gemini) is the AI — and the cloud is just where the application runs. But for teams building at scale, fine-tuning models on proprietary data, or using pre-built AI services (computer vision, transcription, document processing), the choice of cloud provider significantly shapes what AI capabilities are accessible and at what cost.

💡Key Concept

Build vs. API decision: The fundamental question when choosing a cloud AI strategy: (1) Call foundation model APIs directly (fast, simple, expensive at scale, no infrastructure management); (2) Use cloud AI services that wrap foundation models with enterprise features (compliance, fine-tuning, integration); (3) Train custom models on proprietary data (maximum customization, requires ML expertise, highest cost and complexity). Most developers start with (1) and move to (2) as requirements mature.

The big three cloud providers hold approximately 65% of global cloud spending combined, and most AI deployment happens on their infrastructure. Their market shares as of early 2026: AWS ~32%, Azure ~23%, Google Cloud ~12%.

AWS — The Broadest Ecosystem

AWS's AI strategy centers on two products for most developers:

Amazon Bedrock

Bedrock provides managed access to foundation models from multiple providers — no GPU management, no infrastructure, pay per token:

Anthropic Claude (Haiku 4.5, Sonnet 4.6, Opus 4.7)
Meta Llama 4 (Scout, Maverick)
Mistral (Mistral 7 billion, Mixtral 8x7 billion, Mistral Large)
Amazon Nova 2 family (Micro, Lite, Pro, Premier) — Amazon's own model family, updated 2025-2026
Cohere Command A (successor to Command R+)
18 new models added to Bedrock in 2025-2026, for a total of approximately 200 models

Agents for Amazon Bedrock: Build RAG-enabled agents that query knowledge bases (S3-backed vector stores), call Lambda functions as tools, and take multi-step actions. The agent framework is managed — no orchestration infrastructure to build.

Key enterprise advantages: AWS GovCloud for government workloads; the most extensive compliance certification portfolio (FedRAMP High, HIPAA, SOC 2, PCI DSS, ISO 27001) of any cloud provider; already familiar infrastructure for teams with existing AWS workloads.

Amazon SageMaker AI

SageMaker was renamed SageMaker AI in 2025, with a new Unified Studio (GA March 2025) that provides a single interface for data engineering, ML model development, and generative AI. For teams that need to go beyond API calls — fine-tuning models on proprietary data, building custom ML pipelines, or running inference at cost-effective scale:

Data labeling (SageMaker Ground Truth)
Model training with managed GPU clusters, auto-scaling, distributed training
Hyperparameter tuning (automated search for best model configuration)
Model deployment with auto-scaling endpoints
Model monitoring for drift and bias in production

SageMaker is the right choice when you're building custom models, not just calling APIs.

AWS Trainium3 and Inferentia2

AWS's custom AI chips reduce cost for supported workloads:

Trainium3 (GA, 3nm process): delivers 4.4x the performance of Trainium2; designed for training; 40% more cost-efficient than NVIDIA H100 instances for supported models
Inferentia2 (EC2 Inf2 instances): designed for inference; up to 60% lower cost than GPU instances for supported models

Not all models are optimized for Trainium/Inferentia — support is strongest for popular models (Llama, BERT variants, Stable Diffusion). For teams running high-volume inference with supported models, the cost difference is significant.

Pre-Built AI Services

AWS offers 25+ pre-built AI services that are consumed as APIs without any ML knowledge:

Amazon Rekognition: Image and video analysis — face detection, object recognition, content moderation, text extraction from images
Amazon Transcribe: Speech-to-text; 100+ languages; speaker diarization; custom vocabulary for domain-specific terms
Amazon Comprehend: NLP — sentiment analysis, entity recognition, key phrase extraction, topic modeling
Amazon Textract: Document processing — extract structured data from forms, tables, and PDFs beyond what OCR provides
Amazon Polly: Text-to-speech; 60+ voices; SSML support for pronunciation control

For teams building applications with specific AI capabilities (document processing, image analysis, transcription) without wanting to run their own models, these services provide production-ready AI in hours.

Microsoft Azure — The OpenAI Partner

Azure's AI strategy is anchored by its years-long commercial partnership with OpenAI — originally exclusive, restructured by the April 27, 2026 amendment as a primary-but-not-exclusive relationship. Azure OpenAI Service still provides the most established enterprise access to OpenAI's models with Azure's security and compliance infrastructure, and OpenAI is contractually committed to spending $250 billion on Azure services.

Azure OpenAI Service

Enterprise deployment of OpenAI models:

GPT-5 through GPT-5.4 series for application integration (including GPT-5.4 Pro and mini)
DALL-E / GPT Image 1.5 for image generation
Whisper for transcription
Embeddings models for vector search and RAG

The difference from using OpenAI's API directly:

Data privacy: Prompts and completions are not used to train Microsoft or OpenAI models
Data residency: Process data in specific Azure regions for compliance
Enterprise SLAs: Uptime guarantees and support tiers not available on OpenAI's consumer API
Azure networking: Keep traffic within your existing Azure private network infrastructure
Compliance: Azure's certification portfolio covers requirements OpenAI's consumer API cannot certify against

For organizations that need OpenAI model capability but have enterprise data governance requirements, Azure OpenAI Service is frequently the correct answer.

Microsoft Foundry (Renamed from Azure AI Foundry)

Microsoft Foundry (effective January 1, 2026, renamed from Azure AI Foundry / Azure AI Studio) is the central platform for building, deploying, and managing AI applications. Azure is growing 39% year-over-year — roughly 2x AWS's growth rate:

Model catalog: 11,000+ models including OpenAI, Llama, Mistral, Cohere, open-source — all with enterprise deployment options
Fine-tuning: Customize models on proprietary data within Azure's infrastructure
Prompt Flow: Visual workflow builder for LLM chains — design, test, and deploy agentic workflows without writing orchestration code
Content Safety: Automated content filtering with audit logs
Evaluation: Built-in model evaluation tools for quality, coherence, and relevance

M365 Copilot Integration

Uniquely Microsoft: models and agents deployed in Azure AI Foundry can be surfaced inside Microsoft 365 applications. AI built by your team appears in Teams, Word, and Outlook — reaching employees in the tools they use daily.

Google Cloud — First-Party Gemini

Google Cloud's differentiation is first-party: Gemini models are developed by Google DeepMind and hosted on Google's infrastructure, without the licensing complexity of reselling another company's models.

Vertex AI

Google's unified AI platform:

Gemini API on Vertex: Enterprise access to Gemini 3.1 Pro, Flash, and Nano with data governance and regional deployment options not available through the public Gemini API or AI Studio
Model Garden: 150+ models including Gemini, Llama, Mistral, Anthropic Claude — evaluated and deployable on Vertex
Vertex AI Agent Builder and Agent Garden: Build conversational agents, multi-agent workflows, and discover pre-built agent templates using Gemini models, with RAG over Google Cloud Storage or BigQuery
AutoML: Low-code custom model training for structured data, image classification, and NLP tasks

Google TPUs (Ironwood v7)

Google's custom AI chips (Tensor Processing Units) now include the Ironwood v7 generation (GA 2026), achieving 42.5 Exaflops at 9,216 chips — a massive increase from the Trillium v6 generation. Anthropic has committed to over 1 million Ironwood chips, and Google Cloud grew 28% YoY to $12.8 billion quarterly revenue:

Available on Google Cloud as TPU pods
Particularly cost-effective for Google-native model training (Gemini derivatives, JAX-based research)
Improving PyTorch support through XLA/PJRT compiler, but NVIDIA CUDA ecosystem remains dominant for PyTorch workflows

BigQuery ML

Run ML models directly within BigQuery SQL — the analytics database. Execute predictions, train simple models, and call external models (including Vertex AI models) from standard SQL queries:

SELECT predicted_label, confidence
FROM ML.PREDICT(MODEL `project.dataset.my_model`,
  (SELECT * FROM `project.dataset.input_data`))

For data teams that want AI capabilities without leaving their analytics environment, BigQuery ML enables a wide range of predictions from SQL.

Choosing Between the Three

The honest answer: for most new AI applications, the choice of cloud provider is secondary to choosing the right model and architecture. All three providers offer capable managed model hosting and comparable infrastructure.

The practical decision drivers:

Go AWS: Your existing infrastructure is on AWS; you need the widest compliance coverage; you want the most mature managed services ecosystem; you need custom chip discounts via Trainium/Inferentia.

Go Azure: You need OpenAI models with enterprise data governance; your organization is on M365 and wants AI to surface in Teams/Office; your compliance requirement specifically needs Azure certification.

Go Google Cloud: You want first-party Gemini access with enterprise governance; you have existing BigQuery/Workspace infrastructure; your ML team works in JAX and wants TPU access.

Key Takeaways

AWS leads on breadth: Bedrock (~200 models), SageMaker AI (Unified Studio), Trainium3 GA (3nm, 4.4x performance), the widest compliance portfolio, and pre-built AI services for vision, transcription, and NLP
Azure differentiates via the OpenAI partnership and growth (39% YoY, 2x AWS): Microsoft Foundry with 11,000+ models, GPT-5 through 5.4, plus M365 integration that surfaces AI inside everyday productivity tools
Google Cloud's strength is first-party Gemini 3.1 Pro access, TPU Ironwood v7 (42.5 Exaflops), and the Google data ecosystem (BigQuery, Workspace, Agent Garden)
For most teams starting out, the hyperscaler choice follows existing infrastructure — start where you already are, and move if you hit specific limitations

Cloud Hyperscalers

Audio & video lessons are paid features