Embeddings¶

Embeddings convert text into high-dimensional numeric vectors where proximity represents semantic similarity. They are the foundation of vector search, RAG systems, and semantic understanding in LLM applications.

Key Facts¶

Embedding models produce fixed-size vectors (e.g., 1536 dimensions for OpenAI text-embedding-3-large)
Points close in vector space are semantically similar
Embeddings capture meaning, not exact words - "car" and "automobile" are close, "bank" (financial) and "bank" (river) are far
Modern models classify text on ~1000+ abstract features - individual dimensions don't have interpretable meaning

Similarity Metrics¶

Cosine similarity is the standard metric - measures the angle between vectors:

from numpy import dot
from numpy.linalg import norm

def cosine_similarity(a, b):
    return dot(a, b) / (norm(a) * norm(b))
# Range: -1 to 1 (1 = identical direction)

Other metrics: Euclidean distance (L2), dot product (when vectors are normalized, equals cosine similarity).

Embedding Models¶

Model	Dimensions	Provider	Notes
text-embedding-3-large	1536/3072	OpenAI	Adjustable dimensions via `dimensions` param
text-embedding-3-small	512/1536	OpenAI	Cheaper, lower quality
BGE-large	1024	BAAI	Open-source, strong performance
E5-large	1024	Microsoft	Good for retrieval tasks
Cohere embed-v3	1024	Cohere	Multilingual, search-optimized
Ollama embeddings	Varies	Local	Use same Ollama server, free, private

Patterns¶

Generate Embeddings (OpenAI)¶

from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-large",
    input="What is machine learning?"
)
vector = response.data[0].embedding  # list of floats

Generate Embeddings (LangChain)¶

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector = embeddings.embed_query("What is machine learning?")
doc_vectors = embeddings.embed_documents(["doc1", "doc2", "doc3"])

Test-Time Reranking¶

After initial embedding retrieval, use a cross-encoder to rerank by fine-grained relevance:

from sentence_transformers import CrossEncoder

cross = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
pairs = [(query, candidate) for candidate in candidates]
scores = cross.predict(pairs)
best_idx = int(scores.argmax())

Self-Consistency (Best-of-N)¶

Sample N answers, pick most frequent - embeddings can measure answer similarity for clustering:

import collections
def majority_vote(candidates):
    return collections.Counter(candidates).most_common(1)[0]

Known Issues¶

Non-determinism: OpenAI embeddings produce slightly different vectors across API calls for the same text. Small absolute differences but breaks deterministic unit tests.
Cosine similarity misses: can fail to find obviously present text. String/keyword match succeeds where embedding search falls below typical 0.6 threshold.
Semantic vs lexical confusion: "risk of liquidity" may match "liquidity amount" even though they mean different things.
Embedding model must match: query and documents must use the same embedding model. Mixing models produces meaningless similarity scores.

Gotchas¶

Always use the same embedding model for indexing and querying - mixing models gives garbage results
Embedding quality degrades for very short texts (1-2 words) and very long texts (beyond model's max input)
Multilingual embeddings exist but cross-lingual similarity is weaker than same-language
Embedding API calls add latency and cost to every query - consider caching for repeated queries
Dimension reduction (e.g., text-embedding-3-large with fewer dimensions) trades quality for speed/cost