Name: Pinecone
Availability: InStock
Author: Pinecone

Learning Objectives

Understand what a vector database is and why AI applications need one
Identify Pinecone's core features: serverless architecture, metadata filtering, hybrid search, and namespaces
Evaluate when to use Pinecone vs. open-source vector databases or embedded solutions like Supabase Vector

What Is Pinecone?

Pinecone is the most widely used managed cloud vector database, purpose-built for AI applications. A vector database stores data as mathematical vectors (high-dimensional arrays of numbers) and retrieves similar items using vector similarity search — the foundation of Retrieval-Augmented Generation (RAG), semantic search, recommendation systems, and memory in AI agents.

While traditional databases store and retrieve data by exact match (find the row where id = 42), vector databases retrieve by semantic similarity (find the 10 chunks of text most similar in meaning to this query). This is fundamentally different and is what makes vector databases essential for AI applications that need to "remember" or "retrieve" relevant context from large document collections.

✅Tip

Try Pinecone: Create a free account at pinecone.io — the free Starter plan includes 2GB storage (roughly 1–2 million vectors) with no credit card required. The Python and JavaScript SDKs are available via pip install pinecone and npm install @pinecone-database/pinecone.

How Vector Databases Work (The Core Concept)

Before digging into Pinecone specifically, here's the fundamental concept:

Embed your data: Convert documents, text chunks, or records into vectors using an embedding model (OpenAI text-embedding-3-large, Cohere embed-v3, or any embedding model)
Store in Pinecone: Upload the vectors with associated metadata (the original text, document ID, date, etc.)
Query by similarity: Convert a user's question into a vector using the same embedding model, then ask Pinecone "what are the 10 most similar vectors?"
Use the retrieved context: Pass the retrieved relevant chunks to your LLM with the user's question — this is RAG

This is why vector databases are sometimes called the "long-term memory" of AI systems. An LLM's context window is short-term memory (limited tokens); a vector database is long-term memory (unlimited documents, retrieved on demand).

Core Features

Serverless Architecture

Pinecone's serverless mode (launched 2024) eliminates infrastructure management:

No cluster sizing or capacity planning — Pinecone scales automatically
Pay only for what you use — storage and query compute billed per unit
Instant cold starts — no waiting for pods to initialize
Global availability — data replicated across regions automatically

This is the default and recommended mode for new Pinecone deployments.

Metadata Filtering

Store metadata alongside each vector and filter at query time:

# Query with metadata filter
results = index.query(
    vector=query_embedding,
    top_k=10,
    filter={
        "date": {"$gte": "2026-01-01"},
        "source": {"$in": ["docs", "blog"]},
        "category": "machine-learning"
    }
)

Metadata filtering is critical for production RAG systems — without it, you can't restrict retrieval to relevant documents (e.g., only search documents from this specific customer's files).

Hybrid Search

Pinecone supports combining dense vector search (semantic similarity) with sparse keyword search (BM25/TF-IDF) in a single query:

Dense search finds semantically similar content even with different wording
Sparse search finds exact keyword matches that semantic search might miss
Hybrid combines both with a weighted blend — best of both retrieval methods

Namespaces

Namespaces partition an index into isolated segments — critical for multi-tenant applications:

# Store vectors in customer-specific namespace
index.upsert(vectors=docs, namespace=f"customer-{customer_id}")

# Query only within that customer's data
index.query(vector=query, namespace=f"customer-{customer_id}")

One Pinecone index can serve thousands of customers with complete data isolation between them.

Pricing

Plan	Price	Features
Starter (Serverless)	$0/month	2GB (~1-2 million vectors) Free forever No time limit Perfect for development and small apps
Standard (Serverless)	Usage-based	Pay per GB stored + queries Roughly $0.08/GB/month storage $8/1 million queries Scales to billions of vectors
Enterprise	Custom	Unlimited Private endpoints SLAs Data residency

Starter (Serverless)$0/month

2GB (~1-2 million vectors)
Free forever
No time limit
Perfect for development and small apps

Standard (Serverless)Usage-based

Pay per GB stored + queries
Roughly $0.08/GB/month storage
$8/1 million queries
Scales to billions of vectors

EnterpriseCustom

Unlimited
Private endpoints
SLAs
Data residency

Integrations

Pinecone integrates natively with every major AI framework:

LangChain: PineconeVectorStore as a drop-in retriever
LlamaIndex: PineconeVectorStore index
OpenAI Assistants: Retrieval tool backed by Pinecone
LangGraph, CrewAI, AG2: Available as a memory/retrieval tool
AWS, GCP, Azure: Available in all major cloud marketplaces

Strengths

Purpose-built for AI: Designed from scratch for vector workloads — not a general database with a vector plugin
Serverless scalability: Scales from prototype to billions of vectors without infrastructure management
Managed service: No cluster administration; SLAs; 99.99% uptime for production plans
Metadata filtering: Production-grade filtering with complex query expressions
Hybrid search: Dense + sparse search in one query for higher retrieval quality
Ecosystem leadership: Integrates with every major AI framework; largest community of examples and tutorials
Namespaces: True multi-tenant isolation without spinning up separate indexes

Limitations & Considerations

Cost at scale: Usage-based pricing can become expensive for very high query volume — calculate your projected costs before committing
Managed-only: No option to self-host; data stored in Pinecone's cloud (GCP-backed) — consideration for strict data residency requirements
Latency: Network round-trip to Pinecone's cloud adds latency vs. co-located or embedded solutions — typically 50–150ms
Vendor lock-in: The API is proprietary; migrating away from Pinecone requires re-uploading all vectors to a new system

Best Use Cases

Task	Why Pinecone
Production RAG applications	Managed, scalable, with SLAs — built for production
Multi-tenant SaaS with user data isolation	Namespaces provide per-customer data separation
Semantic search across large document sets	Billions of vectors, sub-second query latency
Recommendation systems	Similarity search for product, content, or user recommendations
AI agent long-term memory	Persistent vector memory across sessions

When to choose alternatives:

Already using Supabase or PostgreSQL → Supabase Vector (pgvector) — same DB, no extra service
Privacy/self-hosting required → Qdrant or Weaviate (self-hosted)
Local development only → Chroma (embedded, no server)
MongoDB already in stack → MongoDB Atlas Vector Search

Getting Started

from pinecone import Pinecone
from openai import OpenAI

pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
openai_client = OpenAI(api_key="YOUR_OPENAI_KEY")

# Create an index (serverless)
pc.create_index(
    name="my-rag-index",
    dimension=1536,  # match your embedding model's dimension
    metric="cosine",
    spec={"serverless": {"cloud": "aws", "region": "us-east-1"}}
)

index = pc.Index("my-rag-index")

# Embed and upsert documents
def embed(text):
    return openai_client.embeddings.create(
        input=text, model="text-embedding-3-small"
    ).data[0].embedding

documents = ["AI is transforming industries", "Vector databases enable semantic search"]
vectors = [{"id": str(i), "values": embed(doc), "metadata": {"text": doc}}
           for i, doc in enumerate(documents)]
index.upsert(vectors=vectors)

# Query
query = "How is AI changing business?"
results = index.query(vector=embed(query), top_k=3, include_metadata=True)
for match in results["matches"]:
    print(f"Score: {match['score']:.3f} | Text: {match['metadata']['text']}")