Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
10 min read·Updated June 4, 2026

Google DeepMind: Gemini, Gemma, and Veo

Google DeepMind logoBy Google DeepMind

Explore Google DeepMind's comprehensive model portfolio — Gemini 3 Pro and Flash, Gemma 4, Nano Banana image generation, and Veo 3 video — and understand Google's unique advantages in AI infrastructure and data.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Identify Google DeepMind's key models and compare their capabilities against competing offerings
  • Explain why Google's infrastructure — TPUs, first-party data, and distribution — represents a structural AI advantage
  • Understand when to choose Gemini vs. competing models for specific use cases

Google DeepMind: Unparalleled Infrastructure, Late Consumer Product

Google has been doing foundational AI research longer than almost any other organization. The 2017 "Attention Is All You Need" paper — the Transformer paper that defined modern AI — came from Google Brain. AlphaGo, AlphaFold, and dozens of other landmark AI systems came from DeepMind.

Yet in consumer AI, Google was caught flat-footed by ChatGPT. Bard (the initial response to ChatGPT, launched in 2023) was received poorly. Google's AI products improved rapidly, but the narrative damage of early stumbles was significant.

In 2023, Google merged Google Brain and DeepMind under Google DeepMind, led by Demis Hassabis (DeepMind's founder and CEO). The merger consolidated Google's AI research and brought the organization that built AlphaFold alongside the team that built the Transformer.

Google's structural AI advantages:

  • TPUs (Tensor Processing Units): Custom AI chips that Google has been building for over a decade. TPU v5 and v6 (Trillium) provide training and inference efficiency that NVIDIA can't fully replicate for Google's specific workloads
  • First-party data: YouTube (the world's largest video platform), Google Search (the world's most used search engine), Gmail, Maps, Android — Google sits on unprecedented training data
  • Distribution: AI built into Search, Gmail, Workspace, Chrome, Android, YouTube — Google can reach billions of users through products they already use daily
  • Research history: The Transformer, BERT, T5, AlphaFold, PaLM, Gemini — Google invented or significantly advanced the most important AI architectures

Competitor and Supplier: The Anthropic Investment

Google occupies an unusual position in the AI landscape — it is simultaneously a competitor and a major supplier-investor to Anthropic, makers of the Claude family that competes head-on with Gemini. On April 24, 2026, Google announced a commitment of up to $40 billion in Anthropic: $10 billion immediate cash at a $350 billion valuation, plus $30 billion conditional on Anthropic hitting performance targets. The deal also includes 5 GW of TPU compute capacity, on top of Anthropic's earlier 3.5 GW Broadcom chip deal, plus information rights, board observation, and preferred compute pricing.

This expands Google's prior ~$3 billion stake in Anthropic (~14%) and creates an interesting strategic posture: Google's direct AI products (Gemini, Workspace AI, Search AI Overviews) compete daily with Anthropic's Claude in the developer and enterprise market, while Google Cloud sells Anthropic's models to the same customers — and Google's TPUs power Anthropic's training runs. No other major AI company has this dual role. Compare to Microsoft + OpenAI (single dominant partner relationship until April 2026) or Amazon + Anthropic (compute supplier-investor with no competing first-party model line).

Gemini 3 Pro: The Long-Context Reasoning Leader

Gemini 3 Pro is Google DeepMind's flagship model and, as of early 2026, the highest-rated model on LMArena (the chatbot arena where users vote on which model they prefer).

Key specs and capabilities:

  • LMArena Elo: 1,501 — the highest recorded score on the leading model comparison platform
  • 1 million token context window — by far the largest standard context window among major models. One million tokens is approximately 750,000 words — enough to process a full multi-book series or a large codebase in a single context
  • Native multimodal: Text, images, audio, video — all processed natively from training (not bolted-on via connectors)
  • Deep Think mode: Extended reasoning capability for complex problems, equivalent to "thinking" mode in other models
  • Available via Google AI Studio (free tier), Gemini API, and Gemini Advanced (Google One subscription)

When to use Gemini 3 Pro: Very long document analysis where the 1 million context window matters, tasks requiring multimodal reasoning across different content types, complex reasoning where benchmark performance is paramount, integration with Google Workspace and other Google services.

Gemini 3 Pro

Google DeepMind

Closed

Strengths

1,501 LMArena Elo; 1 million token context; native multimodal; Deep Think mode; best-in-class on multiple benchmarks

Context Window

1 million tokens

Pricing

Free via AI Studio (rate limited); $7/$21 per million tokens API

Gemini 3.1 Pro: The Latest Flagship (February 2026)

Gemini 3.1 Pro, released February 19, 2026, is Google DeepMind's newest flagship model — building on Gemini 3 Pro with significant improvements across reasoning, coding, and scientific benchmarks.

Key benchmarks:

  • 94.3% GPQA Diamond — the highest score ever recorded on this graduate-level science benchmark
  • 77.1% ARC-AGI-2 — double the score of Gemini 3 Pro, demonstrating a major leap in novel reasoning
  • 2,887 Elo on LiveCodeBench Pro — top-tier coding performance
  • Ranked #1 on 12+ of 18 tracked benchmarks at launch

Gemini 3.1 Pro maintains the 1 million token context window and native multimodal capabilities of its predecessor while delivering substantially better reasoning. A lighter variant, Gemini 3.1 Flash Lite, was released on March 3, 2026 for cost-sensitive deployments.

Gemini 3.1 Pro

Google DeepMind

Closed

Strengths

94.3% GPQA Diamond; 77.1% ARC-AGI-2; 1 million context; native multimodal; #1 on 12+ benchmarks

Context Window

1 million tokens

Pricing

Available via AI Studio and Gemini API

Gemini 3 Flash: Frontier Performance at Speed

Gemini 3 Flash is one of the most technically interesting models in the market: it achieves frontier-class coding performance at Flash (fast, cost-efficient) inference speed.

The counterintuitive performance: Gemini 3 Flash scores 78% on SWE-bench Verified — higher than Gemini 3 Pro on this specific coding benchmark. This is not unusual: smaller models can be specialized to outperform larger models on specific tasks when the specialization aligns with the training objective.

Key characteristics:

  • 78% SWE-bench Verified — top-tier coding performance
  • 1 million token context window (same as Pro)
  • 100+ simultaneous tool calls — important for complex agentic workflows that need to call many APIs or databases in parallel
  • Significantly faster and cheaper than Gemini 3 Pro
  • Best for: coding agents, high-throughput applications, agentic workflows that chain many operations

Gemini 3 Flash

Google DeepMind

Closed

Strengths

78% SWE-bench (above Pro); 1 million context; 100+ parallel tool calls; cost-efficient frontier performance

Context Window

1 million tokens

Pricing

$0.075/$0.30 per million tokens

Gemma 4: Google's Open-Source Model Family

Gemma 4 is Google's open-weight model family, released under a permissive Apache 2.0 license and designed for on-device, edge, laptop, and single-GPU deployment.

Size range: E2B, E4B, the new multimodal 12 billion model, and a 26 billion mixture-of-experts variant — covering everything from smartphones and embedded devices to single-GPU advanced reasoning.

Key characteristics:

  • Encoder-free multimodality: Gemma 4 12B feeds vision and audio directly into the language backbone, accepting text, image, and audio input in one model — Google's first mid-sized model with native audio input. It runs on a laptop with 16 gigabytes of memory and nears the 26 billion model's benchmark scores at under half the memory footprint.
  • Multi-token prediction acceleration: draft-and-verify decoding generates several tokens per pass to reduce latency at no quality cost
  • Multilingual: Strong performance across over 35 languages compared to other open models of similar size
  • 128K context window — large for an open-source model of this size
  • On-device capable: E2B and E4B run on modern phones and laptops without a dedicated GPU; the 12 billion and 26 billion variants run on a laptop or a single GPU
  • Purpose-built for agentic workflows, not just chat completion
  • Apache 2.0 license: commercial use, modification, and redistribution permitted with no restriction on competing use

When to use: On-device and multimodal applications where privacy is critical, resource-constrained environments, research, fine-tuning for specific domains, applications requiring data sovereignty without external API calls, and single-GPU or single-laptop agentic workflows.

Gemma 4 12B (multimodal)

Google DeepMind

Open Source

Strengths

Text, vision, and audio input; runs on a 16-gigabyte laptop; near-26-billion benchmark at half the memory; Apache 2.0; 128K context

Context Window

128K tokens

Pricing

Free (open weights, Apache 2.0)

Nano Banana: Image Generation

Nano Banana Pro and Nano Banana 2 are two coexisting image generation models in Google's lineup — different model generations serving different use cases, not a deprecation pair. As of April 2026, Nano Banana 2 is the default image model across Gemini app's Fast, Thinking, and Pro tiers (plus Search AI Mode and Lens), while Nano Banana Pro is retained for AI Pro and Ultra subscribers for specialized professional tasks.

Nano Banana 2 — Gemini 3.1 Flash Image, launched February 26, 2026:

  • Real-time image synthesis: 30 frames per second at 512px resolution, sub-500ms latency
  • Accurate text rendering and character consistency across frames
  • Default for the consumer Gemini app and developer access via Google AI Studio
  • Designed for interactive applications, fast iteration, and high-volume generation

Nano Banana Pro — Gemini 3 Pro Image, launched November 20, 2025:

  • 4K resolution output with the strongest text rendering in Google's image lineup
  • SynthID watermarking: a cryptographic watermark embedded in generated images that survives compression and cropping, allowing verification of AI origin
  • Available to AI Pro and Ultra subscribers in Gemini, plus Vertex AI and Google Workspace surfaces (Slides, Vids, NotebookLM)
  • Best for studio-grade output: marketing, mockups, presentations, professional design

Veo 3 and 3.1: Text-to-Video with Native Audio

Veo 3 is Google DeepMind's text-to-video model — one of the most capable video generation systems available.

Key capabilities:

  • Native audio synthesis: Not just video — Veo 3 generates synchronized audio including dialogue, background sounds, and music
  • Highly photorealistic output quality
  • Camera control: specify shot types, transitions, and movement

Veo 3.1 (released March 2026) upgraded the system with 4K resolution output and clips up to 60 seconds (up from Veo 3's limits). Available in paid preview via the Gemini API.

Google has also released MedGemma 1.5 (4 billion parameters) — a specialized model for medical applications, demonstrating Google's push into domain-specific AI.

Available through Google AI Studio and Vertex AI.

Google's AI in Products: The Distribution Advantage

Beyond the models themselves, Google's real AI advantage is distribution:

  • Google Search: AI Overviews (powered by Gemini) appears for hundreds of millions of searches daily
  • Gmail and Google Workspace: Gemini assists with drafting, summarization, and action items across productivity tools
  • Google Lens and Circle to Search: Visual AI built into Android and Google Chrome
  • Android: On-device Gemini Nano for privacy-sensitive tasks
  • YouTube: AI-powered caption, description, and recommendation systems

The average person interacts with Google's AI many times per day without thinking of it as "using AI." This scale of deployment is both a product advantage and a data feedback advantage — Google learns from billions of daily AI interactions.

Key Takeaways

  • Google DeepMind combines the research heritage of Google Brain and DeepMind, with structural advantages in custom TPUs, first-party data, and distribution through Search, Android, and Workspace
  • Gemini 3.1 Pro (Feb 2026) is the latest flagship: 94.3% GPQA Diamond, 77.1% ARC-AGI-2, #1 on 12+ benchmarks — a major reasoning leap over Gemini 3 Pro
  • Gemini 3 Flash (78% SWE-bench, 1 million context) leads on coding performance at lower cost; Gemini 3.1 Flash Lite extends the cost-efficient tier
  • Gemma 4 is the best-resourced open-weight model family for on-device and single-GPU deployments — E2B and E4B for edge, the new multimodal 12 billion model that runs on a 16-gigabyte laptop, and a 26 billion mixture-of-experts variant for advanced reasoning, all under a permissive Apache 2.0 license
  • Veo 3.1 upgrades video generation to 4K resolution and 60-second clips; MedGemma 1.5 demonstrates Google's push into domain-specific AI
  • Google's deepest competitive advantage may be distribution: embedding AI into products billions of people use daily

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

Tools Covered in This Lesson

🧭Recommended for you