Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
6 min read·Updated May 6, 2026

Cloudflare Workers AI

Cloudflare logoBy Cloudflare

Cloudflare Workers AI is the serverless GPU inference platform running open-source AI models at 300+ global edge locations — pay-per-Neuron pricing with a free daily allocation, near-zero cold starts, tight integration with the Cloudflare developer platform, and a new May 2026 Stripe Projects beta that lets AI agents create Cloudflare accounts, buy domains, and deploy Workers without a human in the loop.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand serverless edge AI inference and where Workers AI fits the AI compute stack
  • Identify the Neuron-based pricing model and how it scales with usage
  • Evaluate when Workers AI beats AWS Bedrock, OpenAI API, or self-hosted GPU inference

What Is Cloudflare Workers AI?

Workers AI is Cloudflare's serverless GPU inference platform — running open-source AI models at hundreds of global edge locations with no GPU rental, no cold-start delays, and per-request pricing. It is the AI inference layer of the Cloudflare developer platform, alongside Workers (compute), R2 (storage), D1 (SQL), KV (key-value), and Durable Objects.

Where AWS Bedrock and Azure OpenAI sell access to closed flagship models from a few regions, Workers AI runs open-source models (Llama, Mistral, embeddings, image classification, speech-to-text) on Cloudflare-managed GPUs at 300+ global edge points. The result: latency that depends on how close the user is to a Cloudflare data center (typically under 100ms anywhere in the world) rather than how close they are to a single AWS region.

💡Key Concept

Why edge AI inference: Most AI workloads don't need a frontier flagship model — they need fast, cheap, predictable inference on smaller models (sentiment, classification, embeddings, simple chat, transcription). Edge inference moves the compute close to the user, dropping latency from hundreds of milliseconds (centralized region) to tens of milliseconds (nearby edge). For real-time AI features inside a web or mobile app, this is the difference between snappy and sluggish.

Tip

Visit Workers AI: developers.cloudflare.com/workers-ai — free 10,000 Neurons daily; Workers Paid plan unlocks higher limits

Pricing

Workers AI pricing is Neuron-based — Neurons measure GPU compute consumed per request, with each model type having its own Neuron rate. Per-model prices are also published in tokens or images for easier estimation.

Free10,000 Neurons per day
  • All available models
  • Rate-limited by daily Neuron cap
  • No credit card required
Workers Paid$5/month base + $0.011 per 1,000 Neurons over free allocation
  • Same model catalog as Free
  • Higher concurrency limits
  • Single-billing for all Cloudflare developer products
Per-Model Token PricingExamples: $0.003 per million input tokens (small models)
  • Up to $2.51 per million images for vision models
  • Granular per-model rates published in docs
  • Predictable cost scaling
Workers EnterpriseCustom contract
  • Higher rate limits
  • SLA guarantees
  • Dedicated support

The free tier is genuinely usable for prototyping and small projects — 10,000 Neurons per day covers thousands of small-model inferences. Cost discipline matters at scale: for high-volume production traffic, compare Neuron pricing against self-hosted GPU economics.

Core Features

Serverless GPU Execution

Pay only for the inference you run — no idle GPU rental, no instance management, no cold-start penalty. Cloudflare manages the GPU pool and routes requests to the nearest available compute. Models warm in milliseconds rather than minutes.

Global Edge Network (300+ locations)

Models run in Cloudflare's global edge network — same infrastructure that serves CDN traffic. End-user latency depends on geographic distance to the nearest Cloudflare data center, typically under 100ms anywhere in the world. Compare to centralized AWS/Azure regions where users in distant regions add 100-300ms of round-trip latency.

Open-Source Model Catalog

Hosted models include Llama variants (Meta), Mistral, embedding models (BGE, MiniLM), Whisper (speech-to-text), Stable Diffusion XL, image classification (ResNet, EfficientNet), and dozens more. Cloudflare keeps the catalog updated with new open-source releases.

Tight Workers Integration

Call Workers AI from a Workers script with env.AI.run('model-name', { prompt: '...' }) — no API key juggling, no auth headers, no separate SDK. The integration is designed for developers building AI features inside web apps using the Cloudflare developer platform.

AI Gateway

Sits in front of any model endpoint (Workers AI, OpenAI, Anthropic) and adds caching, rate limiting, retries, analytics, and a unified observability surface. Useful for production AI applications mixing multiple model providers.

Vectorize (Vector Database)

Cloudflare's vector database for embeddings — pairs naturally with Workers AI embedding models for RAG (Retrieval-Augmented Generation) workloads. Vectors and inference run in the same network, minimizing round-trips.

Stripe Projects — Agentic Onboarding (May 2026 Open Beta)

In May 2026, Cloudflare and Stripe rolled out an open beta of a Stripe Projects integration that gives AI agents end-to-end self-service onboarding to the Cloudflare developer platform: an agent can create a Cloudflare account, buy a domain, and deploy a Worker without a human clicking through forms. Stripe acts as the orchestrator — it handles KYC, issues a scoped payment token to the agent rather than a real card number, and enforces a default 100 dollars per month spending cap per provider. A human still grants the initial permission and accepts terms of service, but for short-running agentic deployments this is the cleanest "agent buys its own stack" pattern any major cloud has shipped to date. The open beta is gated behind Stripe Projects and standard Workers Paid pricing applies once the agent's traffic hits paid limits.

Strengths

  • Generous free tier: 10,000 Neurons per day with no credit card — meaningful for prototyping
  • Global edge inference: Sub-100ms latency anywhere in the world for most models
  • Predictable pricing: Neuron-based metering converts cleanly to per-token or per-image rates
  • Open-source model focus: Llama, Mistral, and other open models — no licensing surprises
  • Cloudflare ecosystem fit: Tight integration with Workers, R2, D1, Vectorize, AI Gateway — single platform for full-stack AI apps
  • No GPU operations: Cloudflare manages the GPU pool; you write code

Limitations & Considerations

  • No frontier closed models: GPT-5, Claude Opus, Gemini Ultra are not on Workers AI — for those, use AI Gateway in front of OpenAI/Anthropic/Google APIs
  • Smaller open-source models: Catalog focuses on production-ready open models, not the absolute largest variants — fine for most workloads, limiting for some
  • Cost at extreme scale: For very high-volume inference (millions of requests per minute), self-hosted GPU economics may beat Neuron pricing — model your specific workload
  • Lock-in risk: AI Gateway, Workers AI, Vectorize are tightly coupled — moving off Cloudflare means rewriting the AI stack
  • Cloudflare dependency: Outages on Cloudflare's edge network affect Workers AI alongside the rest of the developer platform — use AI Gateway redundancy for production reliability

Best Use Cases

Use CaseWhy Workers AI FitsCaveat
Real-time AI features in web appsSub-100ms global latency + Workers integrationSmaller open-source models, not flagship closed models
Embedding generation + RAGWorkers AI + Vectorize + AI Gateway in one platformFor frontier-quality embeddings, OpenAI/Voyage may rank higher
Speech-to-text at scaleWhisper hosted on edge, low per-token pricingCompare against Deepgram or Assembly for production accuracy
Image classification / moderationEfficientNet, ResNet, vision models on edge GPUsFor custom-trained models, host on dedicated GPU clouds
Prototyping AI featuresFree tier with no credit card lowers experimentation costMove to Paid plan for production traffic

When to choose alternatives:

  • Frontier model quality needed → OpenAI API, Anthropic API, or Google Gemini API (use AI Gateway in front of them)
  • Massive training workloads → dedicated GPU cloud (Lambda Cloud, CoreWeave, hyperscaler AI services)
  • Custom-trained model hosting → Modal, Replicate, or AWS SageMaker for arbitrary container deployment
  • Largest-scale production inference → self-hosted GPU on Lambda, CoreWeave, or hyperscaler bare metal

Key Takeaways

  • Cloudflare Workers AI is serverless GPU inference for open-source AI models, running at 300+ global edge locations with sub-100ms typical latency
  • Pricing is Neuron-based: 10,000 Neurons per day free, $0.011 per 1,000 Neurons after, with per-model rates also published in tokens and images
  • Best fit for production AI features inside web/mobile apps where latency, cost predictability, and tight Workers integration matter more than absolute model frontier quality
  • Pair with AI Gateway to mix Workers AI with OpenAI / Anthropic / Google for hybrid open + closed model architectures
  • For frontier-quality flagship models, training workloads, or extreme-scale production inference, alternatives (OpenAI API, Lambda Cloud, dedicated GPU hosting) often serve better
  • The May 2026 Cloudflare + Stripe Projects beta lets AI agents self-serve account creation, domain purchase, and Worker deployment with Stripe orchestrating KYC + scoped payment tokens + a default 100 dollars per month spending cap per provider

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you