6.13 — Vector Databases & RAG

Learning Objectives

Explain what vector embeddings are and why they enable semantic search
Describe the RAG pipeline and why it improves LLM accuracy for domain-specific tasks
Identify the leading vector database options and their appropriate use cases

The Problem RAG Solves

Large language models are trained on general internet data. They don't know about:

Your company's internal documentation
Your customer support history
The specific version of a software library you're using
Research papers published after their training cutoff
Your personal notes and knowledge base

The two naive solutions — fine-tuning the model on your data, or stuffing all your documents into the context window — are expensive, slow, or limited by context size.

Retrieval-Augmented Generation (RAG) is the practical solution: instead of training the model on your data or loading it all upfront, retrieve only the relevant pieces at the moment of the query and pass those to the model.

💡Key Concept

RAG Pipeline:

Index your documents by converting them to vector embeddings (numerical representations of meaning) and storing them in a vector database
Query — when a user asks a question, convert their query to a vector embedding too
Retrieve — find the documents whose embeddings are most similar to the query embedding (semantic search)
Generate — pass the retrieved documents + the user's query to the LLM; it answers based on the retrieved context

What Vector Embeddings Are

A vector embedding is a list of numbers (a vector) that represents the meaning of a piece of text. Text with similar meaning produces vectors that are mathematically close to each other in high-dimensional space.

This is what makes RAG work: "How do I reset my password?" and "I forgot my login credentials" produce similar vectors, even though they share no words. A vector search finds the relevant FAQ answer even if the user didn't use the exact right keywords.

Traditional text search (keyword matching) requires exact or near-exact word matches. Semantic search via vectors matches by meaning, not keywords. For knowledge base and document search applications, this is a significant capability improvement.

The RAG Pipeline in Practice

Documents (PDFs, docs, web pages)
    ↓ Chunking (split into paragraphs)
    ↓ Embedding model (text → vector)
Vector Database (stores vectors + original text)
    ↑ User query → embedding → similarity search
    → Top-k relevant chunks
    → LLM prompt: "Answer based on these chunks: [chunks]\nQuestion: [query]"
    → Grounded, accurate response

The embedding model (often OpenAI's text-embedding-3-large or a local alternative) converts text to vectors. The vector database stores and searches those vectors. The LLM uses the retrieved chunks to answer accurately.

Vector Database Options

Pinecone — Managed Vector Search at Scale

Pinecone is the leading dedicated vector database — purpose-built for production vector search at any scale.

What makes Pinecone the enterprise choice:

Fully managed — no infrastructure to run or maintain
Serverless option — pay only for what you use, scales to zero
High throughput — handles millions of queries per second for production applications
Metadata filtering — filter by arbitrary metadata alongside vector similarity (e.g., "find similar documents where department=legal and date>2024")
Hybrid search — combine vector search with keyword search in one query

Pinecone is the right choice when vector search is a core, high-scale production feature. The managed nature means zero operational overhead.

Free tier: 2GB storage, 100K vectors, serverless.

Best for: Production RAG applications, semantic search features embedded in larger products, high-query-volume applications.

Weaviate — Open-Source with Managed Option

Weaviate is an open-source vector database with a rich feature set:

GraphQL and REST APIs — flexible querying beyond simple vector similarity
Multi-modal — supports image, video, and audio embeddings alongside text
Built-in modules — call embedding models and LLMs directly from within Weaviate queries (no separate orchestration code)
Hybrid search — BM25 keyword + vector in one query
Self-hosted or managed (Weaviate Cloud)

Weaviate is more complex to configure than Pinecone but more flexible — especially for multi-modal use cases where you need to search across images and text together.

Free tier: Weaviate Cloud sandbox, 14-day expiry, for development.

Best for: Multi-modal RAG, teams that prefer open-source infrastructure, complex filtering and query requirements.

Supabase Vector (pgvector) — Postgres-Native Embeddings

Supabase Vector is pgvector — the PostgreSQL extension for vector similarity search — integrated into Supabase's platform.

The key advantage: your vectors live in the same Postgres database as the rest of your application data. This eliminates the separate-system complexity of managing a dedicated vector database.

For most application-scale RAG use cases (hundreds of thousands to low millions of vectors), pgvector performs well. The operational simplicity of one database for everything — relational data, auth, file storage, and vectors — is a genuine advantage.

How to use in a Supabase project:

-- Enable the extension
create extension vector;

-- Create a table with a vector column
create table documents (
  id uuid primary key default gen_random_uuid(),
  content text,
  embedding vector(1536)  -- dimension matches your embedding model
);

-- Similarity search query
select content, 1 - (embedding <=> '[...]') as similarity
from documents
order by embedding <=> '[...]'
limit 5;

Free tier: Included in Supabase free tier (500MB database storage).

Best for: Applications already using Supabase that want to add RAG without adding infrastructure; moderate-scale use cases; teams who want operational simplicity.

Chroma — Local-First for Development

Chroma is an open-source vector database designed for developer ergonomics — especially for local development and prototyping.

Installing and running Chroma locally takes minutes. The Python and JavaScript clients are simple. For building a RAG prototype or developing a retrieval feature before committing to a production database, Chroma is often the fastest path:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_docs")

# Add documents (Chroma can call an embedding function automatically)
collection.add(
    documents=["AI is changing software development", "Vector search enables semantic retrieval"],
    ids=["doc1", "doc2"]
)

# Query
results = collection.query(query_texts=["how does search work?"], n_results=2)

Chroma also offers a hosted cloud option for moving beyond local development.

Best for: Local development and prototyping; learning RAG concepts; applications where Chroma's simplicity and local operation matter more than production-scale performance.

Qdrant — Performance-Focused Open Source

Qdrant (pronounced "quadrant") is a high-performance vector search engine written in Rust — optimized for fast search over large vector collections with minimal resource usage.

Distinctive features:

Payload filtering — rich metadata filtering combined with vector search
Scalar and product quantization — compress vectors to reduce memory without losing accuracy
On-disk storage — search vectors stored on disk rather than RAM (makes large collections feasible on modest hardware)
Rust performance — efficient memory usage and fast query response

Qdrant is the choice when you need to run your own vector database infrastructure with strong performance characteristics — particularly for large-scale or memory-constrained deployments.

Free tier: Qdrant Cloud includes a free tier cluster; self-hosted is free.

Best for: Self-hosted deployments requiring high performance; large vector collections; teams preferring open-source infrastructure control.

MongoDB Atlas Vector Search

MongoDB Atlas Vector Search adds vector search capability to Atlas — meaning teams already using MongoDB can add vector search without migrating to a separate database.

Like Supabase Vector for Postgres users, Atlas Vector Search is compelling primarily as an "add vector search to what you already have" solution. The integration is native: you store vectors alongside your documents in MongoDB collections and query using aggregation pipelines.

Best for: Applications already running on MongoDB Atlas; teams who want to avoid a separate vector database system.

Choosing a Vector Database

✅Tip

Start with what you already have. If you're using Supabase, pgvector is already available — enable it and start. If you're using MongoDB Atlas, use Atlas Vector Search. Only choose a dedicated vector database (Pinecone, Weaviate, Qdrant) when your retrieval requirements outgrow your existing database's vector capabilities.

Scenario	Recommendation
Already using Supabase	pgvector (built in)
Already using MongoDB Atlas	Atlas Vector Search
Local prototyping / learning RAG	Chroma
Production, high scale, managed	Pinecone
Multi-modal (images + text)	Weaviate
Self-hosted, large scale, open source	Qdrant

Key Takeaways

RAG solves the core LLM limitation of not knowing your specific data: index your documents as vector embeddings, retrieve relevant chunks at query time, and pass them to the LLM as context for grounded, accurate answers
Vector embeddings represent text meaning numerically — semantically similar text produces mathematically close vectors, enabling semantic search that matches by meaning rather than keywords
The simplest starting point: Supabase Vector (pgvector) if you're already using Supabase, or Chroma for local prototyping — only reach for a dedicated vector database (Pinecone, Weaviate) when scale requires it
RAG is now a standard pattern for AI-powered knowledge bases, customer support, documentation search, and enterprise AI applications — understanding the pipeline is a foundational skill for AI application development

Vector Databases & RAG

Audio & video lessons are paid features