Vector Databases¶

Vector databases store embedding vectors and enable fast similarity search. They are the persistence and retrieval layer for RAG systems, semantic search, and recommendation engines.

Key Facts¶

Vector DBs use Approximate Nearest Neighbor (ANN) algorithms for sub-linear search time
Exact nearest neighbor is O(n) - impractical for millions of vectors
Metadata filtering allows pre-filtering before similarity search
In-memory stores (Chroma, FAISS) are fast but don't persist across restarts without explicit saving
Hybrid search (vector + keyword/BM25) catches both semantic and lexical matches

Database Comparison¶

Database	Type	Best For	Key Feature
Chroma	Embedded/server	Prototyping, small projects	Simple API, Python-native, in-memory
Pinecone	Managed cloud	Production at scale	Fully managed, auto-scaling, metadata filtering
Qdrant	Self-hosted/cloud	Production with control	Rust-based, fast, rich filtering, payload storage
Weaviate	Self-hosted/cloud	Multimodal search	GraphQL API, hybrid search built-in
FAISS	Library (Meta)	Research, high-performance	Fastest similarity search, GPU support, no built-in persistence
pgvector	PostgreSQL extension	Existing Postgres stack	SQL integration, ACID transactions, familiar tooling

ANN Algorithms¶

HNSW (Hierarchical Navigable Small World)¶

Most popular in production vector DBs
Builds layered graph structure, O(log n) search
High recall (>95%) at very fast speeds
Memory-intensive (stores graph in RAM)

IVF (Inverted File Index)¶

Clusters vectors, searches only relevant clusters
Lower memory than HNSW
Good for very large datasets
Slightly lower recall

Indexing Strategy Selection¶

Strategy	Dataset Size	Memory	Speed	Accuracy
Flat	<100K vectors	High	Slow (exact)	Perfect
IVF-Flat	Medium	Medium	Fast	Good
IVF-PQ	Large	Low	Fast	Moderate
HNSW	Any	High	Very fast	Very good

Patterns¶

Metadata Filtering (Qdrant)¶

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

client = QdrantClient("localhost", port=6333)
results = client.search(
    collection_name="docs",
    query_vector=query_embedding,
    query_filter=Filter(
        must=[
            FieldCondition(key="source", match=MatchValue(value="annual_report")),
            FieldCondition(key="year", range=Range(gte=2023))
        ]
    ),
    limit=5
)

LangChain Vector Store¶

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
results = retriever.invoke("What is the company revenue?")

Hybrid Search¶

Combine vector similarity with keyword/BM25 search:

hybrid_score = alpha * vector_score + (1 - alpha) * bm25_score

Reciprocal Rank Fusion (RRF): Alternative merging without tuning alpha:

score = sum(1 / (rank + k)) across all search methods

Gotchas¶

In-memory vector stores lose data on restart - always persist for production
FAISS is a library, not a database - no built-in CRUD, persistence, or filtering
pgvector performance degrades with millions of vectors without proper index tuning
Metadata filtering happens before vector search in most DBs - design metadata schema carefully
Index building can be slow for large datasets - plan for initial indexing time
Different distance metrics (cosine, L2, dot product) can give different ranking results - match to your embedding model's training