6.14 — Foundation Models & Open Source

Learning Objectives

Understand the difference between foundation models and the chatbot interfaces built on top of them
Evaluate the trade-offs between open-source, open-weight, and closed-source model access
Identify when model-level access (fine-tuning, local deployment, API) is more appropriate than using a consumer chatbot

What Are Foundation Models?

Every AI chatbot you use — ChatGPT, Claude, Gemini — is a user interface built on top of a foundation model. The model is the core intelligence: a neural network trained on massive datasets that understands and generates language, code, images, or other content. The chatbot is just one way to access that intelligence.

Foundation models matter because they represent model-level access — the ability to interact with the AI system directly, without a consumer interface sitting between you and the model. This means you can fine-tune the model on your own data, deploy it on your own infrastructure, integrate it into your own applications via API, or run it entirely offline on local hardware.

The distinction is practical, not just technical. A marketing team using ChatGPT is consuming a foundation model through a consumer interface. A development team deploying Gemma 3 on their own servers to process medical records without sending data to any third party is using the same category of technology — but with fundamentally different control, privacy, and cost characteristics.

💡Key Concept

Open-source vs. open-weight vs. closed: These terms describe a licensing spectrum. Closed models (GPT-5.5, Claude Opus 4.7) are only accessible via API — you cannot download or inspect them. Open-weight models (Llama 4, Gemma 3) let you download and run the trained model, but may restrict commercial use or modification. Fully open-source models (Phi-4 under MIT, GPT-OSS under Apache 2.0) provide weights, training code, and permissive licenses for any use.

Why Model-Level Access Matters

There are four primary reasons developers and enterprises choose to work with foundation models directly rather than through consumer chatbot interfaces:

Fine-Tuning and Customization

Consumer chatbots are general-purpose. When you need an AI that deeply understands your company's terminology, your industry's regulations, or your product's codebase, fine-tuning a foundation model on your own data produces dramatically better results than prompt engineering alone. A law firm fine-tuning Gemma on case law, a hospital training Phi-4 on clinical notes, a retailer adapting Llama for product descriptions — these require model-level access.

Privacy and Data Sovereignty

When you run a model on your own infrastructure, your data never leaves your control. For industries with strict compliance requirements — healthcare (HIPAA), finance (SOX), government (FedRAMP) — local deployment of open models is often the only viable path to AI adoption. No API calls, no third-party data processing agreements, no risk of training data leakage.

Cost Control at Scale

API pricing works well for low-to-moderate usage. But when you are processing millions of documents, generating thousands of responses per hour, or running AI inference 24/7, self-hosting an open model can reduce costs by 10x or more compared to API pricing. The break-even point depends on your volume, but high-throughput applications almost always favor self-hosted models.

Edge and Offline Deployment

Some applications need AI where internet connectivity is unreliable or unavailable — mobile devices, factory floors, remote field operations, aircraft, or vehicles. Small open models like Phi-4 and Gemma 3 (1 billion/4 billion) are designed specifically for these on-device scenarios.

The Tools Landscape

Tool	Best For
Llama 4	Meta's open-weight flagship models (Scout 17Bx16E, Maverick 17Bx128E); top-tier reasoning and coding
Gemma 3	Google's open-weight models (1 billion–27 billion); multilingual, 128K context, runs on consumer hardware
Phi-4	Microsoft's MIT-licensed small model; exceptional math and coding for its size
GPT-OSS	OpenAI's first open-weight model (Apache 2.0); 20 billion+ parameters, fine-tunable
Mistral 7 billion	European open-weight models; strong multilingual performance and efficiency
Qwen 3	Alibaba's open-weight models; leading Chinese-English bilingual capability
DeepSeek V3	High-performance open reasoning model; strong math and coding benchmarks
Hugging Face	The central hub for discovering, downloading, and deploying open models
Ollama	Run open models locally with a single command; Mac, Linux, Windows support
vLLM	High-throughput model serving engine for production deployments
Amazon Bedrock	AWS managed service for accessing multiple foundation models via unified API
Azure AI Studio	Microsoft's platform for deploying and fine-tuning foundation models
Google Vertex AI	Google Cloud's ML platform for model deployment, fine-tuning, and serving
NVIDIA NIM	Optimized model inference containers for NVIDIA GPUs; production-grade serving
Nemotron	NVIDIA's open-weight LLM family (340 billion to 4 billion); Reward model widely used for synthetic data scoring
Claude Mythos Preview	Anthropic's most powerful model (93.9% SWE-bench); invite-only for cybersecurity defense via Project Glasswing
Muse Spark	Meta's proprietary flagship from Meta Superintelligence Labs; multimodal (voice, text, image); powers Meta AI across 3 billion+ users
Mistral Small 4	119 billion MoE (6.5 billion active); Apache 2.0; 256,000 context; runs on consumer GPU

The Licensing Spectrum

Understanding model licenses is essential before deploying any foundation model in production:

License	Examples	Commercial Use	Modification	Key Restriction
MIT	Phi-4	Yes	Yes	None — most permissive
Apache 2.0	GPT-OSS, Mistral	Yes	Yes	Must include license notice
Llama License	Llama 4	Yes (with limits)	Yes	700 million MAU threshold requires Meta approval
Gemma License	Gemma 3	Yes	Yes	Cannot use to train competing models
Closed API	GPT-5.5, Claude	API only	No	No weights available; usage-based pricing

How to Access Foundation Models

There are three primary paths to working with foundation models:

1. Local deployment — Download model weights from Hugging Face and run them on your own hardware using Ollama (simple) or vLLM (production-grade). Best for: privacy, offline use, development, and cost savings at scale.

2. Cloud APIs — Access models through provider APIs (OpenAI, Anthropic, Google) or managed platforms (Bedrock, Vertex AI, Azure AI). Best for: getting started quickly, variable workloads, and accessing closed frontier models.

3. Fine-tuning platforms — Use cloud services to fine-tune open models on your own data without managing infrastructure. Best for: domain-specific customization without deep ML engineering expertise.

Key Takeaways

Foundation models are the core AI systems that power consumer chatbots — accessing them directly gives you control over fine-tuning, privacy, cost, and deployment
The licensing spectrum ranges from fully permissive (MIT, Apache 2.0) to restricted open-weight to fully closed API-only — always check the license before production deployment
Open models like Gemma 3, Phi-4, and Llama 4 now rival closed models on many benchmarks, making self-hosted AI practical for a growing range of applications
Choose local deployment for privacy and cost at scale, cloud APIs for flexibility and frontier model access, and fine-tuning platforms for domain customization

Foundation Models & Open Source

Audio & video lessons are paid features