6.4 — Video Generation | AI Pro Playbook

Learning Objectives

Distinguish between text-to-video models, AI avatar tools, and AI-assisted editing platforms
Compare Veo 3, Runway ML, and Kling AI on quality and capabilities
Select tools appropriate for corporate training videos, creative production, and social media content

Three Categories of AI Video

"AI video" is a broad label for three meaningfully different tool categories:

Text-to-video models generate video from text descriptions (or images). Veo 3, Runway Gen-3, and Kling AI are here. These create original video content from scratch. (OpenAI's Sora was in this category but was discontinued in March 2026.)

AI avatar and presenter tools generate professional video featuring AI-created human presenters speaking customizable scripts. Synthesia and HeyGen are here. These replace the camera-and-presenter workflow for training and corporate content.

AI-assisted editing tools use AI to transform, enhance, or automate editing of existing video footage. Descript is here — the paradigmatic example.

Understanding which category fits your use case is more important than comparing specific tools within the wrong category.

Tool	Best For
Sora 2 (Discontinued)	Was OpenAI's cinematic text-to-video model with synchronized audio; shut down March 2026
Veo 3	Text-to-video with native audio synthesis; Google's most photorealistic model; research and creative production
Runway ML	Professional video editing + AI generation; Gen-3 Alpha; used in commercial and independent film production
Dream Machine (Luma AI)	Native 1080p text-to-video and image-to-video; Ray3.14 model; character consistency; HDR export
Kling AI	High-quality motion; competitive with Sora; up to 2-minute videos; accessible pricing
Synthesia	Corporate training and explainer videos with AI presenters; 230+ languages; no camera or studio needed
HeyGen	AI video avatars; voice cloning; video translation with accurate lip sync; personalized video at scale
Pika Labs	Fast iteration; accessible free tier; text-to-video and image-to-video; social media content
Descript	Edit video by editing the transcript; remove filler words; Overdub voice cloning; screen recording

Sora — OpenAI (Discontinued March 2026)

Sora was OpenAI's text-to-video model, shut down in March 2026 after roughly six months as a standalone product. OpenAI cited the need to prioritize compute resources for enterprise products and research ahead of a potential IPO.

During its brief availability, Sora demonstrated several advances that influenced the field:

Synchronized audio: Ambient sound, music, and contextually appropriate audio generated alongside video
Cinematic quality: Sophisticated lighting, physically realistic motion, and professional-grade camera movement
ChatGPT integration: Video generation accessible within the ChatGPT interface for Plus and Pro subscribers

The Disney partnership — which included plans for character licensing and a $1 billion investment in OpenAI — collapsed alongside the shutdown. Sora's research team continues at OpenAI, redirected toward world simulation for robotics.

⚠️Warning

Sora is no longer available. The iOS app, API, and sora.com are all being shut down. For text-to-video generation, see Veo 3, Runway ML, Kling AI, or Pika Labs below.

Runway ML — Professional Production Tool

Runway occupies the intersection of AI video generation and professional video editing. While Sora and Veo are primarily generation tools, Runway is designed for production workflows that combine AI generation with human creative direction.

Gen-3 Alpha, Runway's latest model, has been used in commercial advertising campaigns and independent film productions. The quality is competitive with Sora on specific styles.

Runway's distinctive capabilities:

Video-to-video: Apply style transformations to existing footage — change the lighting, stylize live footage as animation, apply visual effects
Image-to-video: Animate a still image, creating motion from a photograph or illustration
Motion Brush: Selectively animate parts of a still image — make only the leaves move, or only a specific character walk

For professional video production teams, Runway's combination of AI generation and AI-assisted editing tools within a single workflow is valuable. It's less appropriate for users wanting to generate a complete video from a text description without subsequent editing.

Synthesia — AI Presenters for Enterprise Content

Synthesia serves a specific and large market: organizations that need professional video content featuring human presenters, without the camera, studio, and presenter availability constraints of traditional video production.

The workflow:

Write your script
Choose from 230+ AI avatars (or create a custom avatar from your own footage)
Select from 130+ languages — the avatar lip-syncs to each language
Generate and export

The result: a polished presenter-style video that's indistinguishable from a recorded presentation at normal viewing distance.

Use cases Synthesia excels at:

Corporate training videos: Compliance training, onboarding content, product tutorials
Internal communications: Scalable video messages from leadership, policy announcements
Localization: The same script in 130 languages, with an avatar that lip-syncs in each — without re-recording

Synthesia is not for creative, cinematic, or consumer-facing content where humans can tell the difference at close inspection. It's for functional corporate video at scale.

HeyGen — Personalized Video and Translation

HeyGen extends the AI avatar concept to two additional use cases: personalized video at scale and video translation.

Personalized video: Generate thousands of unique videos where the AI presenter says each recipient's name, references their company, and includes personalized details — useful for sales outreach, customer onboarding, and event communications.

Video translation: Upload a recorded video; HeyGen translates the audio to another language and generates a version of the presenter with accurate lip sync in the new language. This is the capability that's most genuinely novel — automatic dubbing that maintains the appearance of the original speaker.

HeyGen also offers voice cloning — a custom AI voice trained on your own recordings that speaks scripts in your voice, without you recording each video separately.

Descript — Edit Video Through Text

Descript takes a different approach to AI video: rather than generating video from nothing, it makes editing existing video radically faster by representing video as a transcript.

The core insight: the most painful part of editing a talking-head video is finding and removing mistakes, pauses, and filler words. Descript transcribes the video automatically, then lets you edit the transcript as text — deleting a sentence from the transcript deletes it from the video.

Key features:

Remove filler words: One click removes all "ums," "uhs," and other verbal fillers from the transcript and the corresponding audio
Overdub: Clone your voice; type new words and Descript generates audio in your voice — correct mistakes without re-recording
Studio Sound: One-click background noise removal and audio quality enhancement
Screen recording: Record your screen and camera simultaneously; edit the recording immediately

For creators producing tutorial content, course videos, podcasts, or any talking-head video, Descript reduces editing time dramatically.

Choosing the Right Video Tool

Goal	Best Choice
Creative/cinematic video from text	Veo 3 or Kling AI
Professional production workflow	Runway ML
Corporate training with AI presenters	Synthesia
Personalized video or translation	HeyGen
Fast social content generation	Pika Labs
Edit existing talking-head video	Descript

Key Takeaways

AI video tools are meaningfully different in category — text-to-video models (Veo 3, Runway, Kling) create original video; avatar tools (Synthesia, HeyGen) replace camera-based presenter workflows; editing tools (Descript) transform existing footage
Veo 3 leads on cinematic quality for generative video; Runway ML is strongest for professional production integration; Synthesia dominates corporate training and enterprise content at scale; OpenAI's Sora was discontinued in March 2026
Descript represents the most immediate productivity gain for creators producing existing video content — editing through transcript manipulation dramatically reduces the time cost of video editing
Video AI is still maturing — generating highly specific sequences, accurate human facial detail at close range, and complex narrative coherence remain areas where human editorial judgment is needed

Video Generation

Audio & video lessons are paid features