Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
5 min read·Updated May 30, 2026

Groq Cloud

Groq logoBy Groq

Groq Cloud (GroqCloud) is an ultra-fast AI inference platform powered by custom Language Processing Unit (LPU) chips — purpose-built for real-time AI applications. Following NVIDIA's $20 billion December 2025 deal that absorbed founder Jonathan Ross and senior chip leadership, Groq has refocused on its inference neocloud — the on-demand cloud platform sitting on top of the existing LPU fleet — and is raising roughly $650 million to fund the pivot under interim CEO Adam Winter and interim CFO Matt Eng.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand what Groq Cloud is and how its LPU architecture differs from GPUs
  • Evaluate Groq's pricing, supported models, and use cases
  • Assess the impact of NVIDIA's 2025 licensing deal on Groq's future

What Is Groq Cloud?

Groq Cloud (also called GroqCloud) is an AI inference platform that runs large language models on custom-designed Language Processing Units (LPUs) instead of traditional GPUs. The result: dramatically faster token generation — often 10 to 30 times faster than GPU-based alternatives for single-stream inference.

Founded in 2016 by Jonathan Ross (the architect behind Google's original TPU), Groq designed its chips from scratch to solve one problem: making AI inference as fast as possible. While GPUs excel at training models, Groq's LPU architecture is purpose-built for running them.

💡Key Concept

Language Processing Unit (LPU): Groq's custom chip uses SRAM (not HBM memory like GPUs) and deterministic execution — the compiler pre-computes the entire execution graph down to the clock cycle, eliminating the unpredictable memory bottlenecks that slow down GPU inference.

The NVIDIA Deal and the Inference-Cloud Pivot

In December 2025, NVIDIA entered a non-exclusive licensing agreement for Groq's inference technology, paying approximately $20 billion. The deal brought Groq founder Jonathan Ross and the bulk of senior chip-engineering leadership to NVIDIA, while GroqCloud was explicitly excluded and continues operating independently. At GTC 2026 in March, NVIDIA unveiled the Groq 3 LPU built on the licensed intellectual property — a clean separation of the chip lineage (now at NVIDIA) from the cloud service (still under Groq), with the Groq 3 expected to ship in late 2026.

The remaining Groq is now run as an inference-cloud company under interim CEO Adam Winter and interim CFO Matt Eng — both formerly senior Groq finance and operations leaders. The strategic refocus is sharp: Groq is no longer pitching as a frontier chip-design company chasing the next process node, but as an inference-cloud operator running the existing LPU fleet plus collecting royalties on the NVIDIA license.

The pitch is straightforward — inference is now a much larger market than training, and the customer-facing inference category (real-time voice, agentic browser control, low-latency function calling) is where Groq's LPU advantage on tokens-per-second economics remains intact after the chip lineage moved.

Groq is raising roughly $650 million to fund this pivot. Existing investors are leading the round, with Disruptive and Infinitium committed to fill any unsubscribed shares. Cumulative funding now exceeds $2 billion, building on the $750 million round closed in September 2025 at a $6.9 billion valuation.

⚠️Warning

The competitive question has shifted. Pre-NVIDIA-deal, the worry about building on GroqCloud was that the underlying chip lineage might lose its independent edge. Post-pivot, the question is whether Groq can hold the customer-facing inference category as NVIDIA (with the new Groq 3 lineage), AWS Bedrock, Cerebras, and Together AI all push their own inference services using the same or similar chip generations. The token-per-second economics still favor Groq for real-time and agentic workloads — but the moat is operational and customer-facing now, not silicon-only.

Speed Benchmarks

Groq's headline feature is raw inference speed:

ModelGroq Speed (tokens/sec)Typical GPU SpeedSpeedup
Llama 3.1 8B~1,345 tok/sec~100-200 tok/sec7-13x faster
Qwen 3 32B~662 tok/sec~50-100 tok/sec7-13x faster
Llama 2 70B~300 tok/sec (single card)~30-60 tok/sec5-10x faster

These speeds make Groq particularly compelling for real-time chat, agentic AI (where models make many sequential calls), and interactive applications where latency matters more than throughput.

Supported Models

As of March 2026, GroqCloud hosts a curated selection of open-source models:

ToolBest For

The API is OpenAI-compatible — switching from OpenAI to Groq requires changing just the base URL and API key. No code rewrite needed.

Pricing

Free$0 (no credit card)
  • Experimentation and prototyping
  • Rate-limited
DeveloperPay-as-you-go
  • Production applications with higher rate limits
EnterpriseCustom pricing
  • Dedicated capacity and SLAs

Per-token pricing (approximate):

Model SizeInput (per 1 million tokens)Output (per 1 million tokens)
Small (8-17 billion params)~$0.11~$0.11
Mid (32-70 billion params)~$0.59-$0.99~$0.79
Large (120 billion+ params)~$1.00Higher

Cost-saving features: Batch processing saves 50% on input costs. Prompt caching gives 50% off cached input tokens.

Groq vs. Competitors

PlatformStrengthLimitation
Groq CloudFastest single-stream latency; purpose-built LPU siliconNo training; model selection limited to hosted open-source
Cerebras InferenceHigher throughput on frontier models (3,000+ tok/sec on 120 billion param models)Smaller model catalog; less developer tooling
Together AIMore model variety; fine-tuning and custom trainingRuns on rented GPUs; higher latency
AWS BedrockBroadest ecosystem; managed services; enterprise trustSlower inference; higher cost per token

Company Details

DetailInfo
Founded2016 by Jonathan Ross (ex-Google TPU architect; moved to NVIDIA December 2025)
Interim CEOAdam Winter (former senior Groq operations leader)
Interim CFOMatt Eng
HeadquartersMountain View, California
Current RoundRoughly $650 million (in progress, May 2026)
Last Closed Funding$750 million (September 2025) at $6.9 billion valuation
Total RaisedOver $2 billion (cumulative)
DevelopersOver 2 million signups; 360,000+ active monthly
Fortune 10075% have GroqCloud accounts
Data Centers12+ across US, Canada, Middle East, and Europe
Websitegroq.com

Strengths

  • Fastest inference for real-time applications — 10 to 30 times faster than GPU-based alternatives on single-stream queries
  • Free tier with no credit card required — easy to experiment
  • OpenAI-compatible API — switch from OpenAI with a 3-line code change
  • Purpose-built silicon — LPU architecture designed specifically for inference, not repurposed training hardware
  • Cost-competitive — batch processing and prompt caching reduce costs significantly

Limitations and Considerations

  • Inference only — Groq cannot train or fine-tune models; it only runs pre-trained models
  • Limited model selection — only hosts a curated set of open-source models (no proprietary models like GPT or Claude)
  • NVIDIA deal uncertainty — with founder and IP at NVIDIA, GroqCloud's long-term independence is an open question
  • Throughput vs. latency — Groq excels at single-stream speed but Cerebras outperforms on high-batch throughput workloads
  • No custom models — you cannot upload or deploy your own fine-tuned models (unlike Together AI or AWS)

Key Takeaways

  • Groq Cloud delivers the fastest single-stream AI inference available, powered by custom LPU chips purpose-built for running language models
  • The free tier and OpenAI-compatible API make it easy to try — switch from OpenAI by changing just the base URL and API key
  • NVIDIA's $20 billion December 2025 licensing deal brought Groq's founder and senior chip leadership to NVIDIA; GroqCloud was explicitly excluded and has refocused as an inference-cloud company under interim CEO Adam Winter and interim CFO Matt Eng
  • A roughly $650 million funding round is in progress to fund the pivot, led by existing investors with Disruptive and Infinitium committed to fill any unsubscribed shares — taking cumulative funding past $2 billion
  • The strategic bet is that inference is now a much larger market than training and that real-time / agentic workloads (where tokens-per-second economics matter most) remain Groq's strongest competitive ground
  • Best suited for real-time chat, agentic AI workflows, voice and browser-control products, and latency-sensitive applications where speed matters more than throughput

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you