Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
5 min read·Updated April 28, 2026

Together AI

Together AI logoBy Together AI

Together AI is an AI cloud platform for running, fine-tuning, and training 200+ open-source models — offering competitive inference speeds, GPU rentals, and pioneering research like FlashAttention and RedPajama.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand what Together AI offers and how it differs from proprietary AI providers
  • Compare Together AI's pricing and model catalog to alternatives like Groq and AWS Bedrock
  • Evaluate Together AI's research contributions to the open-source AI ecosystem

What Is Together AI?

Together AI (formally Together Computer, Inc.) is a full-stack AI cloud platform that lets developers run, fine-tune, and train open-source AI models. With over 200 models available via a serverless API — including Llama 4, DeepSeek R1, Mixtral, and Qwen — Together AI positions itself as "The AI Native Cloud" for teams that want the power of open-source AI without managing their own infrastructure.

Unlike proprietary providers like OpenAI or Anthropic, Together AI hosts open-source models at dramatically lower prices. Running Llama 4 Maverick on Together AI costs roughly $0.27 per million input tokens — compared to $15 per million for Claude Opus or $2.50 per million for GPT-5.5.

Tip

Try Together AI: together.ai — new accounts receive free credits to experiment with inference, fine-tuning, and model hosting.

What Can You Do?

Serverless Inference

Run any of 200+ models via API with sub-100 millisecond latency. The API is OpenAI-compatible, so switching requires minimal code changes. Models span text generation, image generation, video, code, and audio.

Fine-Tuning

Customize models for your specific use case with LoRA (lightweight) or full fine-tuning. Upload your training data, choose a base model, and Together AI handles the GPU infrastructure. Supports models up to 100 billion+ parameters.

Dedicated GPU Clusters

Rent NVIDIA H100, H200, or B200 GPU clusters for custom training jobs. Together AI has secured 200 megawatts of power capacity across North American data centers — enough for large-scale model training.

Batch Processing

Process up to 30 billion tokens asynchronously at reduced prices — ideal for data processing, evaluation, and offline workloads.

Pricing

Serverless InferencePay-per-token; varies by model size (see below)
LoRA Fine-Tuning$0.48-$2.90 per million tokens depending on model size
Full Fine-Tuning$0.54-$3.20 per million tokens depending on model size
H100 GPU (on-demand)$3.49/hour
H200 GPU (on-demand)$4.19/hour
B200 GPU (on-demand)$7.49/hour

Representative inference pricing:

ModelInput (per 1 million tokens)Output (per 1 million tokens)
Llama 4 Maverick$0.27$0.85
DeepSeek R1$0.55$2.19
R1 Distill Llama 70 billion$0.03~$0.03

These prices are a fraction of proprietary alternatives — making Together AI popular with startups and teams building on open-source models.

Research Contributions

Together AI is not just an infrastructure provider — the team actively advances open-source AI research:

  • FlashAttention-4 — up to 1.3 times faster than cuDNN on NVIDIA Blackwell GPUs; widely adopted across the industry
  • RedPajama-V2 — a 30 trillion token open training dataset, the largest publicly available LLM training dataset
  • Mamba-3 — a state-space model architecture that is faster than Transformers at decode time, open-sourced
  • 50+ peer-reviewed papers with over 10,000 citations

The founding team includes Chris Re and Percy Liang from Stanford, whose research on efficient attention mechanisms and foundation model evaluation shaped the modern AI landscape.

Together AI vs. Competitors

PlatformStrengthModelsBest For
Together AI200+ models; inference + fine-tuning + training200+Full-stack open-source AI development
Groq CloudFastest single-stream latency (custom LPU chips)Limited (~10)Real-time chat and latency-sensitive apps
Fireworks AIFast multimodal inference; HIPAA/SOC2ModerateRegulated industries needing speed
ReplicateEasy model deployment; pay-per-secondLarge (community)Quick prototyping and model experimentation
AWS BedrockBroadest enterprise ecosystemModerate (curated)Enterprise teams already on AWS

Together AI's niche: The broadest open-source model catalog combined with competitive pricing, fine-tuning, and dedicated GPU infrastructure. Groq wins on raw latency; Together AI wins on breadth and end-to-end platform capabilities.

Company Details

DetailInfo
FoundedJune 2022
CEOVipul Ved Prakash (previously Director of Engineering at Apple)
Co-FoundersVipul Ved Prakash; Ce Zhang; Chris Re; Percy Liang
HeadquartersSan Francisco, California
Employees~313
Latest Funding$305 million Series B (February 2025)
Valuation$3.3 billion
Total Raised$534 million across 4 rounds
Key InvestorsGeneral Catalyst; Prosperity7; Salesforce Ventures; NVIDIA; Kleiner Perkins
Estimated Revenue~$300 million annualized (September 2025)
AcquisitionRefuel.ai (May 2025) for data transformation
Websitetogether.ai

Strengths

  • Broadest model catalog — 200+ open-source models across text, image, video, code, and audio in one platform
  • Full-stack platform — inference, fine-tuning, and training in a single service (most competitors offer inference only)
  • Competitive pricing — open-source models at a fraction of proprietary API costs
  • Research leadership — FlashAttention, RedPajama, and Mamba contributions used across the entire AI industry
  • GPU access — on-demand H100, H200, and B200 clusters with 200 megawatts of secured power capacity
  • OpenAI-compatible API — easy migration from proprietary providers

Limitations and Considerations

  • Open-source models only — you cannot access proprietary models like GPT-5.5 or Claude through Together AI
  • Not the fastest — Groq and Fireworks AI both achieve lower latency on inference benchmarks
  • Enterprise maturity — newer company (founded 2022) compared to established cloud providers like AWS or Azure
  • Revenue estimates are unofficial — the $300 million ARR figure comes from third-party analysis, not company disclosures
  • Fine-tuning complexity — while supported, fine-tuning large models still requires ML expertise to get good results

Key Takeaways

  • Together AI is the leading full-stack cloud platform for open-source AI — offering inference, fine-tuning, and training for 200+ models at competitive prices
  • Dramatically cheaper than proprietary AI providers: Llama 4 Maverick costs $0.27 per million input tokens versus $15 for Claude Opus
  • Active research lab producing industry-standard tools (FlashAttention, RedPajama) used across the AI ecosystem
  • Best suited for teams building on open-source models who need more than just inference — fine-tuning, training, and dedicated GPU infrastructure

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you