Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
5 min read·Updated March 27, 2026

Baseten

Baseten logoBy Baseten

Baseten is a high-performance AI model inference platform — backed by NVIDIA — that lets teams deploy, serve, and scale custom and open-source AI models in production with auto-scaling GPU infrastructure.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand what Baseten is and how it fits into the AI infrastructure landscape
  • Compare Baseten's inference platform to alternatives like Together AI, Replicate, and Modal
  • Evaluate Baseten's pricing model and GPU options for production AI workloads

What Is Baseten?

Baseten is a production AI inference platform that makes it easy to deploy, serve, and scale AI models. Instead of managing GPU servers yourself, you deploy your model to Baseten and it handles containerization, GPU allocation, auto-scaling, and monitoring — billed by the minute with no upfront commitments.

Founded in 2019 and backed by a $150 million investment from NVIDIA, Baseten has become the go-to inference platform for AI-native companies like Cursor, Notion, Superhuman, and Quora. The platform processed over 1.3 quadrillion tokens per month by October 2025 — a 100 times volume increase over the course of that year.

💡Key Concept

AI Inference Infrastructure: Training an AI model teaches it to generate responses. Inference is actually running that model to serve real users. Inference infrastructure handles the GPU compute, networking, load balancing, and auto-scaling needed to serve millions of AI requests reliably and affordably. It is the invisible layer between the model and the user.

Core Capabilities

Model Deployment (Truss)

Baseten's open-source framework Truss (~6,000 GitHub stars) handles the complexity of deploying AI models:

  • Containerizes your model with all dependencies
  • Configures GPU allocation and memory
  • Supports PyTorch, TensorFlow, vLLM, SGLang, TensorRT-LLM, and more
  • Deploy with a single command — no Kubernetes or Docker expertise needed

Model APIs

Pre-optimized endpoints for popular models, ready to use without any deployment setup:

ToolBest For

Training and Fine-Tuning

Multi-node fine-tuning infrastructure with seamless promotion to inference endpoints. Train a custom model and deploy it to production in the same platform.

Auto-Scaling

Dynamic scaling that adjusts GPU allocation based on traffic — including scale to zero for models with intermittent usage. No paying for idle GPUs.

Embeddings Inference

Optimized throughput and latency for embedding workloads used in RAG (Retrieval-Augmented Generation) and semantic search applications.

GPU Options and Pricing

Baseten offers pay-per-use pricing billed to the minute with no upfront commitments:

GPUVRAMApproximate Hourly Rate
NVIDIA T416 GBBudget option for smaller models
NVIDIA A10G24 GB~$1.21/hour
NVIDIA A10080 GB~$4.00/hour
NVIDIA H100 MIG40 GB~$3.75/hour
NVIDIA H10080 GB~$6.50/hour
NVIDIA B200 (Blackwell)180 GB~$9.98/hour

Model APIs use per-token pricing that varies by model. Baseten reports 225% better cost-performance on Google Cloud Blackwell instances for high-throughput workloads compared to previous-generation GPUs.

Speed Benchmarks

ModelTime to First TokenThroughput
GPT-OSS-120B0.25 secondsHigh throughput
Kimi K2 Thinking300 milliseconds140+ tokens per second
Nemotron 3 SuperFast478.3 tokens per second

Baseten vs. Competitors

PlatformBest ForKey Difference
BasetenProduction inference for custom modelsTruss open-source framework; auto-scaling; Blackwell GPU support; NVIDIA-backed
Together AIFull-stack open-source AI (inference + training + fine-tuning)200+ pre-built models; broader model catalog; more established
ReplicateQuick prototyping and model demosEasiest setup; weaker for private production workloads
ModalGeneral Python compute and batch jobsDeveloper-centric (Python decorators); better for workflows beyond just inference
RunPodCost-sensitive GPU rentalLower-cost bare metal; less managed infrastructure
AWS SageMakerEnterprise ML lifecycle on AWSFull AWS ecosystem; heavier setup; Baseten deploys faster

Baseten's niche: Production-grade inference with maximum control over deployment. The Truss framework lets you bring any model with any configuration, while auto-scaling and per-minute billing keep costs aligned with actual usage.

Company Details

DetailInfo
Founded2019
CEOTuhin Srivastava (co-founder)
HeadquartersSan Francisco, California
Employees~60+ (growing rapidly after 3 funding rounds in 12 months)
Valuation$5 billion (January 2026; up from $825 million in February 2025)
Latest Funding$300 million Series E (January 2026; led by IVP and CapitalG)
Total Raised~$585 million
NVIDIA Investment$150 million (participated in Series E)
Revenue Growth10x in 2025
Inference Volume1.3 quadrillion tokens per month (October 2025)
Notable CustomersCursor; Notion; Superhuman; Quora; HeyGen; Writer; Clay
Open SourceTruss framework (~6,000 GitHub stars)
Websitebaseten.co

Strengths

  • Production-grade inference — purpose-built for reliability, low latency, and auto-scaling at massive scale
  • NVIDIA-backed — $150 million investment and close hardware partnership; early access to Blackwell B200 GPUs
  • Truss open-source framework — deploy any model with any framework; full control over configuration without managing infrastructure
  • Scale to zero — no paying for idle GPUs; per-minute billing aligns costs with actual usage
  • Explosive growth — 10x revenue and 100x volume growth in 2025; $5 billion valuation validates the platform

Limitations and Considerations

  • Inference-focused — while training was added in 2025, Baseten is primarily an inference platform; Together AI and Databricks offer more comprehensive AI development environments
  • Smaller model catalog — fewer pre-built Model API endpoints than Together AI (200+ models) or Replicate
  • Enterprise maturity — smaller company (~60+ employees) compared to established cloud providers
  • No permanent free tier confirmed — sign-up credits may be available, but no guaranteed free usage like Groq or Cerebras
  • Custom deployment complexity — while Truss simplifies things, deploying custom models still requires ML engineering expertise

Key Takeaways

  • Baseten is a high-performance AI inference platform backed by NVIDIA that processed 1.3 quadrillion tokens per month by October 2025 — a 100 times increase over the year
  • The Truss open-source framework lets you deploy any AI model to production with auto-scaling, scale-to-zero, and per-minute billing on GPUs from T4 to Blackwell B200
  • Valued at $5 billion after raising $585 million total (including $150 million from NVIDIA); 10x revenue growth in 2025
  • Best suited for AI-native companies that need production-grade inference with maximum control over model deployment and GPU configuration

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you