Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
5 min read·Updated March 27, 2026

SAM 2 (Segment Anything)

Meta logoBy Meta

SAM 2 (Segment Anything Model 2) is Meta's open-source foundation model for image and video segmentation — capable of identifying and segmenting any object in any image or video with a single click, point, or text prompt, without requiring task-specific training.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand what SAM 2 is and how zero-shot segmentation works
  • Identify practical applications of image and video segmentation in AI workflows
  • Evaluate SAM 2's capabilities and limitations for production use cases

What Is SAM 2?

SAM 2 (Segment Anything Model 2) is Meta's open-source foundation model for image and video segmentation — the task of identifying and outlining specific objects or regions within visual content. Released in July 2024 as the successor to the original SAM (April 2023), SAM 2 extends the "segment anything" capability from static images to real-time video.

The breakthrough: SAM 2 performs zero-shot segmentation — it can identify and segment any object in any image or video without requiring task-specific training data. Point at something, click, and SAM 2 outlines it. No labels, no fine-tuning, no category-specific models.

This capability was previously only possible with models trained specifically for each object category (a separate model for people, cars, medical images, etc.). SAM makes segmentation a general-purpose capability.

Tip

Try SAM 2: segment-anything-2.com — interactive demo. Source code and weights at github.com/facebookresearch/sam2. Apache 2.0 license.

How SAM 2 Works

Prompt-Based Segmentation

SAM 2 accepts multiple types of prompts to identify what to segment:

  • Point prompts — click a point on the object you want to segment
  • Box prompts — draw a bounding box around the area of interest
  • Text prompts — describe what you want to segment in natural language
  • Mask prompts — provide a rough mask and SAM refines it

Image Segmentation

For static images, SAM 2 generates pixel-accurate masks:

  • Segments any object regardless of category (people, animals, products, medical structures, industrial parts)
  • Handles complex scenes with overlapping objects
  • Produces multiple plausible segmentation masks when the prompt is ambiguous

Video Segmentation

SAM 2's major advancement over SAM 1 — real-time video segmentation:

  • Object tracking — segment an object in one frame, and SAM 2 tracks it across the entire video
  • Temporal consistency — masks remain stable as objects move, rotate, and change scale
  • Occlusion handling — maintains tracking when objects are temporarily hidden behind other objects
  • Real-time capable — fast enough for interactive video editing workflows

Applications

Creative and Media

  • Video editing — isolate subjects for background replacement, color grading, or effects
  • Photo editing — precise object selection without manual masking
  • Content creation — extract objects from images for compositing

Medical Imaging

  • Radiology — segment tumors, organs, or anomalies in CT/MRI scans
  • Pathology — identify cellular structures in microscopy images
  • Surgical planning — 3D reconstruction from segmented medical images

Industrial and Robotics

  • Quality inspection — identify defects on manufacturing lines
  • Autonomous navigation — segment road surfaces, obstacles, and lanes
  • Robotic manipulation — identify graspable objects and their boundaries

Data Annotation

  • Training data creation — dramatically accelerates the labeling of segmentation datasets
  • Active learning — use SAM predictions as starting points for human annotators to refine

Access

DetailInfo
PriceFree (open source)
LicenseApache 2.0
Source Codegithub.com/facebookresearch/sam2
Model WeightsDownloadable (multiple sizes)
FrameworkPyTorch
HardwareGPU recommended for real-time performance; CPU possible for batch processing

Strengths

  • Zero-shot segmentation — works on any object without task-specific training; truly general-purpose
  • Image and video — SAM 2 extends to temporal segmentation with object tracking and occlusion handling
  • Open source (Apache 2.0) — fully permissive license; no commercial restrictions
  • Multiple prompt types — points, boxes, text, and masks provide flexible interaction
  • Production-proven — widely adopted in creative tools, medical imaging, robotics, and data annotation
  • Foundation for fine-tuning — can be fine-tuned on domain-specific data for even better performance in specialized applications

Limitations & Considerations

  • GPU recommended — real-time performance requires a capable GPU; CPU inference is possible but slow
  • Segmentation only — SAM identifies object boundaries but doesn't classify what the object is (no semantic labels)
  • Complex scenes — performance degrades with very cluttered scenes or tiny objects
  • Video memory — processing long videos requires significant GPU memory for temporal tracking
  • Not a complete pipeline — segmentation is one step; applications typically need additional models for classification, measurement, or action

Key Takeaways

  • SAM 2 is Meta's open-source foundation model for image and video segmentation — capable of segmenting any object with zero-shot prompting (click, box, or text)
  • The extension to video (object tracking, temporal consistency, occlusion handling) makes SAM 2 practical for video editing, medical imaging, robotics, and data annotation
  • Fully open source (Apache 2.0) and built on PyTorch; widely adopted across creative, medical, industrial, and research applications
  • Represents the emergence of foundation models beyond language — general-purpose vision capabilities that previously required task-specific models

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you