Learning Objectives
- Explain what vector embeddings are and why they enable semantic search
- Describe the RAG pipeline and why it improves LLM accuracy for domain-specific tasks
- Identify the leading vector database options and their appropriate use cases
The Problem RAG Solves
Large language models are trained on general internet data. They don't know about:
- Your company's internal documentation
- Your customer support history
- The specific version of a software library you're using
- Research papers published after their training cutoff
- Your personal notes and knowledge base
The two naive solutions — fine-tuning the model on your data, or stuffing all your documents into the context window — are expensive, slow, or limited by context size.
Retrieval-Augmented Generation (RAG) is the practical solution: instead of training the model on your data or loading it all upfront, retrieve only the relevant pieces at the moment of the query and pass those to the model.
💡Key Concept
RAG Pipeline:
- Index your documents by converting them to vector embeddings (numerical representations of meaning) and storing them in a vector database
- Query — when a user asks a question, convert their query to a vector embedding too
- Retrieve — find the documents whose embeddings are most similar to the query embedding (semantic search)
- Generate — pass the retrieved documents + the user's query to the LLM; it answers based on the retrieved context
What Vector Embeddings Are
A vector embedding is a list of numbers (a vector) that represents the meaning of a piece of text. Text with similar meaning produces vectors that are mathematically close to each other in high-dimensional space.
This is what makes RAG work: "How do I reset my password?" and "I forgot my login credentials" produce similar vectors, even though they share no words. A vector search finds the relevant FAQ answer even if the user didn't use the exact right keywords.
Traditional text search (keyword matching) requires exact or near-exact word matches. Semantic search via vectors matches by meaning, not keywords. For knowledge base and document search applications, this is a significant capability improvement.
The RAG Pipeline in Practice
Documents (PDFs, docs, web pages)
↓ Chunking (split into paragraphs)
↓ Embedding model (text → vector)
Vector Database (stores vectors + original text)
↑ User query → embedding → similarity search
→ Top-k relevant chunks
→ LLM prompt: "Answer based on these chunks: [chunks]\nQuestion: [query]"
→ Grounded, accurate response
The embedding model (often OpenAI's text-embedding-3-large or a local alternative) converts text to vectors. The vector database stores and searches those vectors. The LLM uses the retrieved chunks to answer accurately.
Vector Database Options
Pinecone — Managed Vector Search at Scale
Pinecone is the leading dedicated vector database — purpose-built for production vector search at any scale.
What makes Pinecone the enterprise choice:
- Fully managed — no infrastructure to run or maintain
- Serverless option — pay only for what you use, scales to zero
- High throughput — handles millions of queries per second for production applications
- Metadata filtering — filter by arbitrary metadata alongside vector similarity (e.g., "find similar documents where department=legal and date>2024")
- Hybrid search — combine vector search with keyword search in one query
Pinecone is the right choice when vector search is a core, high-scale production feature. The managed nature means zero operational overhead.
Free tier: 2GB storage, 100K vectors, serverless.
Best for: Production RAG applications, semantic search features embedded in larger products, high-query-volume applications.
Weaviate — Open-Source with Managed Option
Weaviate is an open-source vector database with a rich feature set:
- GraphQL and REST APIs — flexible querying beyond simple vector similarity
- Multi-modal — supports image, video, and audio embeddings alongside text
- Built-in modules — call embedding models and LLMs directly from within Weaviate queries (no separate orchestration code)
- Hybrid search — BM25 keyword + vector in one query
- Self-hosted or managed (Weaviate Cloud)
Weaviate is more complex to configure than Pinecone but more flexible — especially for multi-modal use cases where you need to search across images and text together.
Free tier: Weaviate Cloud sandbox, 14-day expiry, for development.
Best for: Multi-modal RAG, teams that prefer open-source infrastructure, complex filtering and query requirements.
Supabase Vector (pgvector) — Postgres-Native Embeddings
Supabase Vector is pgvector — the PostgreSQL extension for vector similarity search — integrated into Supabase's platform.
The key advantage: your vectors live in the same Postgres database as the rest of your application data. This eliminates the separate-system complexity of managing a dedicated vector database.
For most application-scale RAG use cases (hundreds of thousands to low millions of vectors), pgvector performs well. The operational simplicity of one database for everything — relational data, auth, file storage, and vectors — is a genuine advantage.
How to use in a Supabase project:
-- Enable the extension
create extension vector;
-- Create a table with a vector column
create table documents (
id uuid primary key default gen_random_uuid(),
content text,
embedding vector(1536) -- dimension matches your embedding model
);
-- Similarity search query
select content, 1 - (embedding <=> '[...]') as similarity
from documents
order by embedding <=> '[...]'
limit 5;
Free tier: Included in Supabase free tier (500MB database storage).
Best for: Applications already using Supabase that want to add RAG without adding infrastructure; moderate-scale use cases; teams who want operational simplicity.
Chroma — Local-First for Development
Chroma is an open-source vector database designed for developer ergonomics — especially for local development and prototyping.
Installing and running Chroma locally takes minutes. The Python and JavaScript clients are simple. For building a RAG prototype or developing a retrieval feature before committing to a production database, Chroma is often the fastest path:
import chromadb
client = chromadb.Client()
collection = client.create_collection("my_docs")
# Add documents (Chroma can call an embedding function automatically)
collection.add(
documents=["AI is changing software development", "Vector search enables semantic retrieval"],
ids=["doc1", "doc2"]
)
# Query
results = collection.query(query_texts=["how does search work?"], n_results=2)
Chroma also offers a hosted cloud option for moving beyond local development.
Best for: Local development and prototyping; learning RAG concepts; applications where Chroma's simplicity and local operation matter more than production-scale performance.
Qdrant — Performance-Focused Open Source
Qdrant (pronounced "quadrant") is a high-performance vector search engine written in Rust — optimized for fast search over large vector collections with minimal resource usage.
Distinctive features:
- Payload filtering — rich metadata filtering combined with vector search
- Scalar and product quantization — compress vectors to reduce memory without losing accuracy
- On-disk storage — search vectors stored on disk rather than RAM (makes large collections feasible on modest hardware)
- Rust performance — efficient memory usage and fast query response
Qdrant is the choice when you need to run your own vector database infrastructure with strong performance characteristics — particularly for large-scale or memory-constrained deployments.
Free tier: Qdrant Cloud includes a free tier cluster; self-hosted is free.
Best for: Self-hosted deployments requiring high performance; large vector collections; teams preferring open-source infrastructure control.
MongoDB Atlas Vector Search
MongoDB Atlas Vector Search adds vector search capability to Atlas — meaning teams already using MongoDB can add vector search without migrating to a separate database.
Like Supabase Vector for Postgres users, Atlas Vector Search is compelling primarily as an "add vector search to what you already have" solution. The integration is native: you store vectors alongside your documents in MongoDB collections and query using aggregation pipelines.
Best for: Applications already running on MongoDB Atlas; teams who want to avoid a separate vector database system.
Choosing a Vector Database
✅Tip
Start with what you already have. If you're using Supabase, pgvector is already available — enable it and start. If you're using MongoDB Atlas, use Atlas Vector Search. Only choose a dedicated vector database (Pinecone, Weaviate, Qdrant) when your retrieval requirements outgrow your existing database's vector capabilities.
| Scenario | Recommendation |
|---|---|
| Already using Supabase | pgvector (built in) |
| Already using MongoDB Atlas | Atlas Vector Search |
| Local prototyping / learning RAG | Chroma |
| Production, high scale, managed | Pinecone |
| Multi-modal (images + text) | Weaviate |
| Self-hosted, large scale, open source | Qdrant |
Key Takeaways
- RAG solves the core LLM limitation of not knowing your specific data: index your documents as vector embeddings, retrieve relevant chunks at query time, and pass them to the LLM as context for grounded, accurate answers
- Vector embeddings represent text meaning numerically — semantically similar text produces mathematically close vectors, enabling semantic search that matches by meaning rather than keywords
- The simplest starting point: Supabase Vector (pgvector) if you're already using Supabase, or Chroma for local prototyping — only reach for a dedicated vector database (Pinecone, Weaviate) when scale requires it
- RAG is now a standard pattern for AI-powered knowledge bases, customer support, documentation search, and enterprise AI applications — understanding the pipeline is a foundational skill for AI application development





