4.3 — Google DeepMind: Gemini, Gemma, and Veo

Learning Objectives

Identify Google DeepMind's key models and compare their capabilities against competing offerings
Explain why Google's infrastructure — TPUs, first-party data, and distribution — represents a structural AI advantage
Understand when to choose Gemini vs. competing models for specific use cases

Google DeepMind: Unparalleled Infrastructure, Late Consumer Product

Google has been doing foundational AI research longer than almost any other organization. The 2017 "Attention Is All You Need" paper — the Transformer paper that defined modern AI — came from Google Brain. AlphaGo, AlphaFold, and dozens of other landmark AI systems came from DeepMind.

Yet in consumer AI, Google was caught flat-footed by ChatGPT. Bard (the initial response to ChatGPT, launched in 2023) was received poorly. Google's AI products improved rapidly, but the narrative damage of early stumbles was significant.

In 2023, Google merged Google Brain and DeepMind under Google DeepMind, led by Demis Hassabis (DeepMind's founder and CEO). The merger consolidated Google's AI research and brought the organization that built AlphaFold alongside the team that built the Transformer.

Google's structural AI advantages:

TPUs (Tensor Processing Units): Custom AI chips that Google has been building for over a decade. TPU v5 and v6 (Trillium) provide training and inference efficiency that NVIDIA can't fully replicate for Google's specific workloads
First-party data: YouTube (the world's largest video platform), Google Search (the world's most used search engine), Gmail, Maps, Android — Google sits on unprecedented training data
Distribution: AI built into Search, Gmail, Workspace, Chrome, Android, YouTube — Google can reach billions of users through products they already use daily
Research history: The Transformer, BERT, T5, AlphaFold, PaLM, Gemini — Google invented or significantly advanced the most important AI architectures

Competitor and Supplier: The Anthropic Investment

Google occupies an unusual position in the AI landscape — it is simultaneously a competitor and a major supplier-investor to Anthropic, makers of the Claude family that competes head-on with Gemini. On April 24, 2026, Google announced a commitment of up to $40 billion in Anthropic: $10 billion immediate cash at a $350 billion valuation, plus $30 billion conditional on Anthropic hitting performance targets. The deal also includes 5 GW of TPU compute capacity, on top of Anthropic's earlier 3.5 GW Broadcom chip deal, plus information rights, board observation, and preferred compute pricing.

This expands Google's prior ~$3 billion stake in Anthropic (~14%) and creates an interesting strategic posture: Google's direct AI products (Gemini, Workspace AI, Search AI Overviews) compete daily with Anthropic's Claude in the developer and enterprise market, while Google Cloud sells Anthropic's models to the same customers — and Google's TPUs power Anthropic's training runs. No other major AI company has this dual role. Compare to Microsoft + OpenAI (single dominant partner relationship until April 2026) or Amazon + Anthropic (compute supplier-investor with no competing first-party model line).

Gemini 3 Pro: The Long-Context Reasoning Leader

Gemini 3 Pro is Google DeepMind's flagship model and, as of early 2026, the highest-rated model on LMArena (the chatbot arena where users vote on which model they prefer).

Key specs and capabilities:

LMArena Elo: 1,501 — the highest recorded score on the leading model comparison platform
1 million token context window — by far the largest standard context window among major models. One million tokens is approximately 750,000 words — enough to process a full multi-book series or a large codebase in a single context
Native multimodal: Text, images, audio, video — all processed natively from training (not bolted-on via connectors)
Deep Think mode: Extended reasoning capability for complex problems, equivalent to "thinking" mode in other models
Available via Google AI Studio (free tier), Gemini API, and Gemini Advanced (Google One subscription)

When to use Gemini 3 Pro: Very long document analysis where the 1 million context window matters, tasks requiring multimodal reasoning across different content types, complex reasoning where benchmark performance is paramount, integration with Google Workspace and other Google services.

Gemini 3 Pro

Google DeepMind

Closed

Strengths

1,501 LMArena Elo; 1 million token context; native multimodal; Deep Think mode; best-in-class on multiple benchmarks

Context Window

1 million tokens

Pricing

Free via AI Studio (rate limited); $7/$21 per million tokens API

Google DeepMind →

Gemini 3.1 Pro: The Latest Flagship (February 2026)

Gemini 3.1 Pro, released February 19, 2026, is Google DeepMind's newest flagship model — building on Gemini 3 Pro with significant improvements across reasoning, coding, and scientific benchmarks.

Key benchmarks:

94.3% GPQA Diamond — the highest score ever recorded on this graduate-level science benchmark
77.1% ARC-AGI-2 — double the score of Gemini 3 Pro, demonstrating a major leap in novel reasoning
2,887 Elo on LiveCodeBench Pro — top-tier coding performance
Ranked #1 on 12+ of 18 tracked benchmarks at launch

Gemini 3.1 Pro maintains the 1 million token context window and native multimodal capabilities of its predecessor while delivering substantially better reasoning. A lighter variant, Gemini 3.1 Flash Lite, was released on March 3, 2026 for cost-sensitive deployments.

Gemini 3.1 Pro

Google DeepMind

Closed

Strengths

94.3% GPQA Diamond; 77.1% ARC-AGI-2; 1 million context; native multimodal; #1 on 12+ benchmarks

Context Window

1 million tokens

Pricing

Available via AI Studio and Gemini API

Google DeepMind →

Gemini 3 Flash: Frontier Performance at Speed

Gemini 3 Flash is one of the most technically interesting models in the market: it achieves frontier-class coding performance at Flash (fast, cost-efficient) inference speed.

The counterintuitive performance: Gemini 3 Flash scores 78% on SWE-bench Verified — higher than Gemini 3 Pro on this specific coding benchmark. This is not unusual: smaller models can be specialized to outperform larger models on specific tasks when the specialization aligns with the training objective.

Key characteristics:

78% SWE-bench Verified — top-tier coding performance
1 million token context window (same as Pro)
100+ simultaneous tool calls — important for complex agentic workflows that need to call many APIs or databases in parallel
Significantly faster and cheaper than Gemini 3 Pro
Best for: coding agents, high-throughput applications, agentic workflows that chain many operations

Gemini 3 Flash

Google DeepMind

Closed

Strengths

78% SWE-bench (above Pro); 1 million context; 100+ parallel tool calls; cost-efficient frontier performance

Context Window

1 million tokens

Pricing

$0.075/$0.30 per million tokens

Google DeepMind →

Gemma 4: Google's Open-Source Model Family

Gemma 4 is Google's open-weight model family, released under a permissive Apache 2.0 license and designed for on-device, edge, laptop, and single-GPU deployment.

Size range: E2B, E4B, the new multimodal 12 billion model, and a 26 billion mixture-of-experts variant — covering everything from smartphones and embedded devices to single-GPU advanced reasoning.

Key characteristics:

Encoder-free multimodality: Gemma 4 12B feeds vision and audio directly into the language backbone, accepting text, image, and audio input in one model — Google's first mid-sized model with native audio input. It runs on a laptop with 16 gigabytes of memory and nears the 26 billion model's benchmark scores at under half the memory footprint.
Multi-token prediction acceleration: draft-and-verify decoding generates several tokens per pass to reduce latency at no quality cost
Multilingual: Strong performance across over 35 languages compared to other open models of similar size
128K context window — large for an open-source model of this size
On-device capable: E2B and E4B run on modern phones and laptops without a dedicated GPU; the 12 billion and 26 billion variants run on a laptop or a single GPU
Purpose-built for agentic workflows, not just chat completion
Apache 2.0 license: commercial use, modification, and redistribution permitted with no restriction on competing use

When to use: On-device and multimodal applications where privacy is critical, resource-constrained environments, research, fine-tuning for specific domains, applications requiring data sovereignty without external API calls, and single-GPU or single-laptop agentic workflows.

Gemma 4 12B (multimodal)

Google DeepMind

Open Source

Strengths

Text, vision, and audio input; runs on a 16-gigabyte laptop; near-26-billion benchmark at half the memory; Apache 2.0; 128K context

Context Window

128K tokens

Pricing

Free (open weights, Apache 2.0)

Google DeepMind →

Nano Banana: Image Generation

Nano Banana Pro and Nano Banana 2 are two coexisting image generation models in Google's lineup — different model generations serving different use cases, not a deprecation pair. As of April 2026, Nano Banana 2 is the default image model across Gemini app's Fast, Thinking, and Pro tiers (plus Search AI Mode and Lens), while Nano Banana Pro is retained for AI Pro and Ultra subscribers for specialized professional tasks.

Nano Banana 2 — Gemini 3.1 Flash Image, launched February 26, 2026:

Real-time image synthesis: 30 frames per second at 512px resolution, sub-500ms latency
Accurate text rendering and character consistency across frames
Default for the consumer Gemini app and developer access via Google AI Studio
Designed for interactive applications, fast iteration, and high-volume generation

Nano Banana Pro — Gemini 3 Pro Image, launched November 20, 2025:

4K resolution output with the strongest text rendering in Google's image lineup
SynthID watermarking: a cryptographic watermark embedded in generated images that survives compression and cropping, allowing verification of AI origin
Available to AI Pro and Ultra subscribers in Gemini, plus Vertex AI and Google Workspace surfaces (Slides, Vids, NotebookLM)
Best for studio-grade output: marketing, mockups, presentations, professional design

Veo 3 and 3.1: Text-to-Video with Native Audio

Veo 3 is Google DeepMind's text-to-video model — one of the most capable video generation systems available.

Key capabilities:

Native audio synthesis: Not just video — Veo 3 generates synchronized audio including dialogue, background sounds, and music
Highly photorealistic output quality
Camera control: specify shot types, transitions, and movement

Veo 3.1 (released March 2026) upgraded the system with 4K resolution output and clips up to 60 seconds (up from Veo 3's limits). Available in paid preview via the Gemini API.

Google has also released MedGemma 1.5 (4 billion parameters) — a specialized model for medical applications, demonstrating Google's push into domain-specific AI.

Available through Google AI Studio and Vertex AI.

Google's AI in Products: The Distribution Advantage

Beyond the models themselves, Google's real AI advantage is distribution:

Google Search: AI Overviews (powered by Gemini) appears for hundreds of millions of searches daily
Gmail and Google Workspace: Gemini assists with drafting, summarization, and action items across productivity tools
Google Lens and Circle to Search: Visual AI built into Android and Google Chrome
Android: On-device Gemini Nano for privacy-sensitive tasks
YouTube: AI-powered caption, description, and recommendation systems

The average person interacts with Google's AI many times per day without thinking of it as "using AI." This scale of deployment is both a product advantage and a data feedback advantage — Google learns from billions of daily AI interactions.

Key Takeaways

Google DeepMind combines the research heritage of Google Brain and DeepMind, with structural advantages in custom TPUs, first-party data, and distribution through Search, Android, and Workspace
Gemini 3.1 Pro (Feb 2026) is the latest flagship: 94.3% GPQA Diamond, 77.1% ARC-AGI-2, #1 on 12+ benchmarks — a major reasoning leap over Gemini 3 Pro
Gemini 3 Flash (78% SWE-bench, 1 million context) leads on coding performance at lower cost; Gemini 3.1 Flash Lite extends the cost-efficient tier
Gemma 4 is the best-resourced open-weight model family for on-device and single-GPU deployments — E2B and E4B for edge, the new multimodal 12 billion model that runs on a 16-gigabyte laptop, and a 26 billion mixture-of-experts variant for advanced reasoning, all under a permissive Apache 2.0 license
Veo 3.1 upgrades video generation to 4K resolution and 60-second clips; MedGemma 1.5 demonstrates Google's push into domain-specific AI
Google's deepest competitive advantage may be distribution: embedding AI into products billions of people use daily

Google DeepMind: Gemini, Gemma, and Veo

Audio & video lessons are paid features

Learning Objectives

Google DeepMind: Unparalleled Infrastructure, Late Consumer Product

Competitor and Supplier: The Anthropic Investment

Gemini 3 Pro: The Long-Context Reasoning Leader

Gemini 3.1 Pro: The Latest Flagship (February 2026)

Gemini 3 Flash: Frontier Performance at Speed

Gemma 4: Google's Open-Source Model Family

Nano Banana: Image Generation

Veo 3 and 3.1: Text-to-Video with Native Audio

Google's AI in Products: The Distribution Advantage

Key Takeaways

Save your progress & take the quiz

Tools Covered in This Lesson