Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
9 min read·Updated April 28, 2026

Vector Databases & RAG

Vector databases and Retrieval-Augmented Generation (RAG) solve LLMs' most practical limitation — the inability to access your specific data — by enabling AI to search and reason over your documents, knowledge bases, and custom content.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Explain what vector embeddings are and why they enable semantic search
  • Describe the RAG pipeline and why it improves LLM accuracy for domain-specific tasks
  • Identify the leading vector database options and their appropriate use cases

The Problem RAG Solves

Large language models are trained on general internet data. They don't know about:

  • Your company's internal documentation
  • Your customer support history
  • The specific version of a software library you're using
  • Research papers published after their training cutoff
  • Your personal notes and knowledge base

The two naive solutions — fine-tuning the model on your data, or stuffing all your documents into the context window — are expensive, slow, or limited by context size.

Retrieval-Augmented Generation (RAG) is the practical solution: instead of training the model on your data or loading it all upfront, retrieve only the relevant pieces at the moment of the query and pass those to the model.

💡Key Concept

RAG Pipeline:

  1. Index your documents by converting them to vector embeddings (numerical representations of meaning) and storing them in a vector database
  2. Query — when a user asks a question, convert their query to a vector embedding too
  3. Retrieve — find the documents whose embeddings are most similar to the query embedding (semantic search)
  4. Generate — pass the retrieved documents + the user's query to the LLM; it answers based on the retrieved context

What Vector Embeddings Are

A vector embedding is a list of numbers (a vector) that represents the meaning of a piece of text. Text with similar meaning produces vectors that are mathematically close to each other in high-dimensional space.

This is what makes RAG work: "How do I reset my password?" and "I forgot my login credentials" produce similar vectors, even though they share no words. A vector search finds the relevant FAQ answer even if the user didn't use the exact right keywords.

Traditional text search (keyword matching) requires exact or near-exact word matches. Semantic search via vectors matches by meaning, not keywords. For knowledge base and document search applications, this is a significant capability improvement.

The RAG Pipeline in Practice

Documents (PDFs, docs, web pages)
    ↓ Chunking (split into paragraphs)
    ↓ Embedding model (text → vector)
Vector Database (stores vectors + original text)
    ↑ User query → embedding → similarity search
    → Top-k relevant chunks
    → LLM prompt: "Answer based on these chunks: [chunks]\nQuestion: [query]"
    → Grounded, accurate response

The embedding model (often OpenAI's text-embedding-3-large or a local alternative) converts text to vectors. The vector database stores and searches those vectors. The LLM uses the retrieved chunks to answer accurately.

Vector Database Options

Pinecone — Managed Vector Search at Scale

Pinecone is the leading dedicated vector database — purpose-built for production vector search at any scale.

What makes Pinecone the enterprise choice:

  • Fully managed — no infrastructure to run or maintain
  • Serverless option — pay only for what you use, scales to zero
  • High throughput — handles millions of queries per second for production applications
  • Metadata filtering — filter by arbitrary metadata alongside vector similarity (e.g., "find similar documents where department=legal and date>2024")
  • Hybrid search — combine vector search with keyword search in one query

Pinecone is the right choice when vector search is a core, high-scale production feature. The managed nature means zero operational overhead.

Free tier: 2GB storage, 100K vectors, serverless.

Best for: Production RAG applications, semantic search features embedded in larger products, high-query-volume applications.

Weaviate — Open-Source with Managed Option

Weaviate is an open-source vector database with a rich feature set:

  • GraphQL and REST APIs — flexible querying beyond simple vector similarity
  • Multi-modal — supports image, video, and audio embeddings alongside text
  • Built-in modules — call embedding models and LLMs directly from within Weaviate queries (no separate orchestration code)
  • Hybrid search — BM25 keyword + vector in one query
  • Self-hosted or managed (Weaviate Cloud)

Weaviate is more complex to configure than Pinecone but more flexible — especially for multi-modal use cases where you need to search across images and text together.

Free tier: Weaviate Cloud sandbox, 14-day expiry, for development.

Best for: Multi-modal RAG, teams that prefer open-source infrastructure, complex filtering and query requirements.

Supabase Vector (pgvector) — Postgres-Native Embeddings

Supabase Vector is pgvector — the PostgreSQL extension for vector similarity search — integrated into Supabase's platform.

The key advantage: your vectors live in the same Postgres database as the rest of your application data. This eliminates the separate-system complexity of managing a dedicated vector database.

For most application-scale RAG use cases (hundreds of thousands to low millions of vectors), pgvector performs well. The operational simplicity of one database for everything — relational data, auth, file storage, and vectors — is a genuine advantage.

How to use in a Supabase project:

-- Enable the extension
create extension vector;

-- Create a table with a vector column
create table documents (
  id uuid primary key default gen_random_uuid(),
  content text,
  embedding vector(1536)  -- dimension matches your embedding model
);

-- Similarity search query
select content, 1 - (embedding <=> '[...]') as similarity
from documents
order by embedding <=> '[...]'
limit 5;

Free tier: Included in Supabase free tier (500MB database storage).

Best for: Applications already using Supabase that want to add RAG without adding infrastructure; moderate-scale use cases; teams who want operational simplicity.

Chroma — Local-First for Development

Chroma is an open-source vector database designed for developer ergonomics — especially for local development and prototyping.

Installing and running Chroma locally takes minutes. The Python and JavaScript clients are simple. For building a RAG prototype or developing a retrieval feature before committing to a production database, Chroma is often the fastest path:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_docs")

# Add documents (Chroma can call an embedding function automatically)
collection.add(
    documents=["AI is changing software development", "Vector search enables semantic retrieval"],
    ids=["doc1", "doc2"]
)

# Query
results = collection.query(query_texts=["how does search work?"], n_results=2)

Chroma also offers a hosted cloud option for moving beyond local development.

Best for: Local development and prototyping; learning RAG concepts; applications where Chroma's simplicity and local operation matter more than production-scale performance.

Qdrant — Performance-Focused Open Source

Qdrant (pronounced "quadrant") is a high-performance vector search engine written in Rust — optimized for fast search over large vector collections with minimal resource usage.

Distinctive features:

  • Payload filtering — rich metadata filtering combined with vector search
  • Scalar and product quantization — compress vectors to reduce memory without losing accuracy
  • On-disk storage — search vectors stored on disk rather than RAM (makes large collections feasible on modest hardware)
  • Rust performance — efficient memory usage and fast query response

Qdrant is the choice when you need to run your own vector database infrastructure with strong performance characteristics — particularly for large-scale or memory-constrained deployments.

Free tier: Qdrant Cloud includes a free tier cluster; self-hosted is free.

Best for: Self-hosted deployments requiring high performance; large vector collections; teams preferring open-source infrastructure control.

MongoDB Atlas Vector Search adds vector search capability to Atlas — meaning teams already using MongoDB can add vector search without migrating to a separate database.

Like Supabase Vector for Postgres users, Atlas Vector Search is compelling primarily as an "add vector search to what you already have" solution. The integration is native: you store vectors alongside your documents in MongoDB collections and query using aggregation pipelines.

Best for: Applications already running on MongoDB Atlas; teams who want to avoid a separate vector database system.

Choosing a Vector Database

Tip

Start with what you already have. If you're using Supabase, pgvector is already available — enable it and start. If you're using MongoDB Atlas, use Atlas Vector Search. Only choose a dedicated vector database (Pinecone, Weaviate, Qdrant) when your retrieval requirements outgrow your existing database's vector capabilities.

ScenarioRecommendation
Already using Supabasepgvector (built in)
Already using MongoDB AtlasAtlas Vector Search
Local prototyping / learning RAGChroma
Production, high scale, managedPinecone
Multi-modal (images + text)Weaviate
Self-hosted, large scale, open sourceQdrant

Key Takeaways

  • RAG solves the core LLM limitation of not knowing your specific data: index your documents as vector embeddings, retrieve relevant chunks at query time, and pass them to the LLM as context for grounded, accurate answers
  • Vector embeddings represent text meaning numerically — semantically similar text produces mathematically close vectors, enabling semantic search that matches by meaning rather than keywords
  • The simplest starting point: Supabase Vector (pgvector) if you're already using Supabase, or Chroma for local prototyping — only reach for a dedicated vector database (Pinecone, Weaviate) when scale requires it
  • RAG is now a standard pattern for AI-powered knowledge bases, customer support, documentation search, and enterprise AI applications — understanding the pipeline is a foundational skill for AI application development

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

Tools Covered in This Lesson

🧭Recommended for you