Name: SAM 2 (Segment Anything)
Availability: InStock
Author: Meta

Learning Objectives

Understand what SAM 2 is and how zero-shot segmentation works
Identify practical applications of image and video segmentation in AI workflows
Evaluate SAM 2's capabilities and limitations for production use cases

What Is SAM 2?

SAM 2 (Segment Anything Model 2) is Meta's open-source foundation model for image and video segmentation — the task of identifying and outlining specific objects or regions within visual content. Released in July 2024 as the successor to the original SAM (April 2023), SAM 2 extends the "segment anything" capability from static images to real-time video.

The breakthrough: SAM 2 performs zero-shot segmentation — it can identify and segment any object in any image or video without requiring task-specific training data. Point at something, click, and SAM 2 outlines it. No labels, no fine-tuning, no category-specific models.

This capability was previously only possible with models trained specifically for each object category (a separate model for people, cars, medical images, etc.). SAM makes segmentation a general-purpose capability.

✅Tip

Try SAM 2: segment-anything-2.com — interactive demo. Source code and weights at github.com/facebookresearch/sam2. Apache 2.0 license.

How SAM 2 Works

Prompt-Based Segmentation

SAM 2 accepts multiple types of prompts to identify what to segment:

Point prompts — click a point on the object you want to segment
Box prompts — draw a bounding box around the area of interest
Text prompts — describe what you want to segment in natural language
Mask prompts — provide a rough mask and SAM refines it

Image Segmentation

For static images, SAM 2 generates pixel-accurate masks:

Segments any object regardless of category (people, animals, products, medical structures, industrial parts)
Handles complex scenes with overlapping objects
Produces multiple plausible segmentation masks when the prompt is ambiguous

Video Segmentation

SAM 2's major advancement over SAM 1 — real-time video segmentation:

Object tracking — segment an object in one frame, and SAM 2 tracks it across the entire video
Temporal consistency — masks remain stable as objects move, rotate, and change scale
Occlusion handling — maintains tracking when objects are temporarily hidden behind other objects
Real-time capable — fast enough for interactive video editing workflows

Applications

Creative and Media

Video editing — isolate subjects for background replacement, color grading, or effects
Photo editing — precise object selection without manual masking
Content creation — extract objects from images for compositing

Medical Imaging

Radiology — segment tumors, organs, or anomalies in CT/MRI scans
Pathology — identify cellular structures in microscopy images
Surgical planning — 3D reconstruction from segmented medical images

Industrial and Robotics

Quality inspection — identify defects on manufacturing lines
Autonomous navigation — segment road surfaces, obstacles, and lanes
Robotic manipulation — identify graspable objects and their boundaries

Data Annotation

Training data creation — dramatically accelerates the labeling of segmentation datasets
Active learning — use SAM predictions as starting points for human annotators to refine

Access

Detail	Info
Price	Free (open source)
License	Apache 2.0
Source Code	github.com/facebookresearch/sam2
Model Weights	Downloadable (multiple sizes)
Framework	PyTorch
Hardware	GPU recommended for real-time performance; CPU possible for batch processing

Strengths

Zero-shot segmentation — works on any object without task-specific training; truly general-purpose
Image and video — SAM 2 extends to temporal segmentation with object tracking and occlusion handling
Open source (Apache 2.0) — fully permissive license; no commercial restrictions
Multiple prompt types — points, boxes, text, and masks provide flexible interaction
Production-proven — widely adopted in creative tools, medical imaging, robotics, and data annotation
Foundation for fine-tuning — can be fine-tuned on domain-specific data for even better performance in specialized applications

Limitations & Considerations

GPU recommended — real-time performance requires a capable GPU; CPU inference is possible but slow
Segmentation only — SAM identifies object boundaries but doesn't classify what the object is (no semantic labels)
Complex scenes — performance degrades with very cluttered scenes or tiny objects
Video memory — processing long videos requires significant GPU memory for temporal tracking
Not a complete pipeline — segmentation is one step; applications typically need additional models for classification, measurement, or action

Key Takeaways

SAM 2 is Meta's open-source foundation model for image and video segmentation — capable of segmenting any object with zero-shot prompting (click, box, or text)
The extension to video (object tracking, temporal consistency, occlusion handling) makes SAM 2 practical for video editing, medical imaging, robotics, and data annotation
Fully open source (Apache 2.0) and built on PyTorch; widely adopted across creative, medical, industrial, and research applications
Represents the emergence of foundation models beyond language — general-purpose vision capabilities that previously required task-specific models

SAM 2 (Segment Anything)

Audio & video lessons are paid features