Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
6 min read·Updated March 8, 2026

Amazon S3

Amazon logoBy Amazon

Amazon S3 (Simple Storage Service) is the world's most widely used object storage platform — the infrastructure backbone of the web that powers data lakes, AI training datasets, application assets, backup systems, and static website hosting for millions of applications globally.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand what Amazon S3 is and how it differs from consumer cloud storage like Drive or Dropbox
  • Identify S3's core features: object storage, buckets, storage classes, and event-driven integrations
  • Recognize S3's role in AI and ML workflows — data lakes, training data, and model artifact storage

What Is Amazon S3?

Amazon S3 (Simple Storage Service) is AWS's object storage service, launched in 2006 as one of the first AWS services and still the most widely used. Unlike consumer-facing platforms such as Google Drive or Dropbox, S3 is designed for programmatic, application-level storage — developers, DevOps engineers, and data engineers use S3 to store and retrieve any amount of data from anywhere.

S3 stores objects (files, blobs, binary data) in buckets (containers), addressed by keys (file paths). It scales from a single file to exabytes of data with no pre-provisioning required. S3 is the storage layer beneath much of the modern internet — website assets, application files, log archives, database backups, and AI training datasets are all commonly stored in S3.

Tip

Try Amazon S3: aws.amazon.com/s3 — first 12 months free tier includes 5GB standard storage, 20,000 GET requests, and 2,000 PUT requests; after that, pricing is per-GB stored and per-request

Pricing

S3 pricing is usage-based with no minimum commitment:

Storage$0.023 per GB/month
  • First 50TB
  • Lower at higher volumes
PUT/COPY/POST/LIST requests$0.005 per 1,000 requests
  • Write operations
GET and all other requests$0.0004 per 1,000 requests
  • Read operations
Data transfer out to internet$0.09 per GB (first 10TB)
  • Free within same AWS region
S3 Intelligent-Tiering$0.023 per GB + monitoring fee
  • Auto-moves data to cheaper tiers based on access
S3 Glacier Deep Archive$0.00099 per GB/month
  • Lowest-cost archival
  • Retrieval takes hours

For most application use cases, S3 is extremely cost-effective — storing 100GB of infrequently accessed data costs roughly $2–3/month. Data transfer out to the internet is the most significant cost driver for high-traffic applications.

Core Features

Object Storage and Buckets

The fundamental model:

  • Bucket: A named container for objects. Bucket names are globally unique across AWS.
  • Object: Any file — from a text file to a multi-TB video — stored with metadata and an access key
  • Key: The path/filename for the object (e.g., datasets/training/batch-001.jsonl)
  • No folder hierarchy: S3 uses flat storage with key prefixes that simulate folder paths

Storage Classes — Cost Optimization

S3 offers multiple storage classes with different cost/access tradeoff profiles:

Storage ClassUse CaseRetrievalMonthly Cost (per GB)
S3 StandardFrequently accessed data; web assets; active datasetsMilliseconds$0.023
S3 Standard-IAInfrequently accessed; backups kept warmMilliseconds$0.0125
S3 One Zone-IAInfrequently accessed; single availability zoneMilliseconds$0.01
S3 Glacier Instant RetrievalArchives needing millisecond accessMilliseconds$0.004
S3 Glacier Flexible RetrievalArchives; restore in minutes to hoursMinutes to hours$0.0036
S3 Glacier Deep ArchiveLong-term archival; 7–10 year retentionHours$0.00099
S3 Intelligent-TieringUnpredictable access patternsVariesVaries + $0.0025/1K objects

S3 in AI and ML Workflows

S3 is foundational infrastructure for AI development:

  • Training data storage: Store datasets (images, text, JSON, Parquet) that SageMaker, Bedrock, or custom training scripts pull from
  • Model artifact storage: Trained model weights and checkpoints are saved to S3 between training runs
  • Data lakes: S3 serves as the storage layer for AWS Glue (ETL), Athena (SQL queries on S3 data), and Amazon Bedrock Knowledge Bases (RAG)
  • Vector store inputs: Raw documents pre-processed and stored in S3 before ingestion into vector databases
  • Feature stores: Pre-computed ML features stored as Parquet files in S3, accessible to training and inference pipelines

💡Key Concept

S3 as the AI data backbone: When an AI company trains a large language model on web-crawled text, they typically store the raw corpus, cleaned datasets, tokenized batches, and model checkpoints in S3 or an equivalent object store. S3's virtually unlimited scale, high durability (99.999999999% — "11 nines"), and tight integration with AWS compute (EC2, SageMaker) make it the de facto standard for AI/ML data infrastructure.

Event-Driven Integrations

S3 supports event notifications that trigger downstream processing automatically:

  • S3 → Lambda: A new file uploaded to a bucket triggers a Lambda function (resizing images, processing CSV, running inference)
  • S3 → SQS/SNS: Fan out notifications to processing queues
  • S3 → Bedrock: New documents in S3 can trigger re-indexing into a Bedrock Knowledge Base for RAG

Access Control and Security

  • Bucket policies: JSON-based policies controlling who can access which objects
  • IAM roles: Attach permissions to EC2 instances or Lambda functions without hard-coding credentials
  • S3 Block Public Access: Setting to prevent any accidental public exposure — critical for sensitive data
  • Versioning: Maintain all versions of every object — essential for AI dataset lineage and rollback
  • Server-side encryption: Automatic AES-256 encryption at rest (S3-SSE) or customer-managed keys (SSE-KMS)

Strengths

  • Scale: No pre-provisioning; store from bytes to exabytes
  • Durability: 99.999999999% durability (objects are replicated across multiple availability zones)
  • AWS ecosystem integration: Native integration with every AWS service — compute, analytics, AI/ML, streaming
  • Cost efficiency at scale: Very low cost for cold data (Glacier Deep Archive under $1/TB/month)
  • Ecosystem de facto standard: S3-compatible APIs are implemented by every major cloud storage provider and many on-premise systems

Limitations & Considerations

  • Not for end users: S3 has no consumer UI for everyday file management — it requires AWS console knowledge or programmatic access via SDK/CLI
  • Data transfer egress costs: Downloading large amounts of data out of AWS to the internet can become expensive
  • No collaborative editing: S3 stores files; it has no Docs-style editing, search, or preview
  • Complexity for newcomers: AWS IAM policies, bucket permissions, and lifecycle rules have a learning curve

Best Use Cases

TaskWhy S3
AI training dataset storageScalable, durable, directly accessible by SageMaker and custom training code
Application asset hostingImages, videos, static files served via CloudFront CDN
Database and system backupsAutomated backup to S3 Standard-IA or Glacier
Data lake foundationStore raw and processed data for Athena SQL queries and analytics
Model checkpoint storageSave large model weights between training runs
Log archivalCompress and archive application logs to Glacier Deep Archive cheaply

When to choose alternatives:

  • Personal or team productivity file storage → Google Drive or OneDrive
  • Simple cloud backup for consumers → Backblaze B2 (simpler and cheaper egress)
  • Non-AWS developer storage → Google Cloud Storage or Azure Blob Storage
  • Team file collaboration → Dropbox or SharePoint

Getting Started

  1. Create an AWS account at aws.amazon.com — free tier includes 5GB S3 Standard for 12 months
  2. Open the S3 console and click Create Bucket — choose a globally unique name and a region close to your users
  3. Enable Block Public Access for private buckets (default for new buckets)
  4. Upload a file via the console or use the AWS CLI: aws s3 cp myfile.txt s3://my-bucket/
  5. Configure a Lifecycle Rule to transition objects to Glacier after 90 days if they won't be frequently accessed

📝Note

S3-compatible storage: Many storage systems implement the S3 API — including Cloudflare R2 (no egress fees), Backblaze B2, MinIO (self-hosted), and DigitalOcean Spaces. This means code written against S3 can often be redirected to these alternatives with minimal changes, which is useful for cost optimization or compliance requirements.

Key Takeaways

  • Amazon S3 is object storage infrastructure — not a consumer productivity tool but the storage backbone of the web and AI ecosystem
  • It stores objects in buckets, scales to unlimited size, and integrates natively with every AWS service including SageMaker, Bedrock, Lambda, and Athena
  • Multiple storage classes (Standard, Intelligent-Tiering, Glacier) allow dramatic cost optimization based on how frequently data is accessed
  • S3 is foundational to AI/ML workflows: training datasets, model checkpoints, data lakes, and RAG document stores are all typically S3-backed
  • For AI developers, comfort with S3 (and AWS IAM for access control) is an essential infrastructure skill

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you