Skip to content

Embeddings

Embeddings convert text into high-dimensional numeric vectors where proximity represents semantic similarity. They are the foundation of vector search, RAG systems, and semantic understanding in LLM applications.

Key Facts

  • Embedding models produce fixed-size vectors (e.g., 1536 dimensions for OpenAI text-embedding-3-large)
  • Points close in vector space are semantically similar
  • Embeddings capture meaning, not exact words - "car" and "automobile" are close, "bank" (financial) and "bank" (river) are far
  • Modern models classify text on ~1000+ abstract features - individual dimensions don't have interpretable meaning

Similarity Metrics

Cosine similarity is the standard metric - measures the angle between vectors:

from numpy import dot
from numpy.linalg import norm

def cosine_similarity(a, b):
    return dot(a, b) / (norm(a) * norm(b))
# Range: -1 to 1 (1 = identical direction)

Other metrics: Euclidean distance (L2), dot product (when vectors are normalized, equals cosine similarity).

Embedding Models

Model Dimensions Provider Notes
text-embedding-3-large 1536/3072 OpenAI Adjustable dimensions via dimensions param
text-embedding-3-small 512/1536 OpenAI Cheaper, lower quality
BGE-large 1024 BAAI Open-source, strong performance
E5-large 1024 Microsoft Good for retrieval tasks
Cohere embed-v3 1024 Cohere Multilingual, search-optimized
Ollama embeddings Varies Local Use same Ollama server, free, private

Patterns

Generate Embeddings (OpenAI)

from openai import OpenAI
client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-large",
    input="What is machine learning?"
)
vector = response.data[0].embedding  # list of floats

Generate Embeddings (LangChain)

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector = embeddings.embed_query("What is machine learning?")
doc_vectors = embeddings.embed_documents(["doc1", "doc2", "doc3"])

Test-Time Reranking

After initial embedding retrieval, use a cross-encoder to rerank by fine-grained relevance:

from sentence_transformers import CrossEncoder

cross = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
pairs = [(query, candidate) for candidate in candidates]
scores = cross.predict(pairs)
best_idx = int(scores.argmax())

Self-Consistency (Best-of-N)

Sample N answers, pick most frequent - embeddings can measure answer similarity for clustering:

import collections
def majority_vote(candidates):
    return collections.Counter(candidates).most_common(1)[0]

Known Issues

  • Non-determinism: OpenAI embeddings produce slightly different vectors across API calls for the same text. Small absolute differences but breaks deterministic unit tests.
  • Cosine similarity misses: can fail to find obviously present text. String/keyword match succeeds where embedding search falls below typical 0.6 threshold.
  • Semantic vs lexical confusion: "risk of liquidity" may match "liquidity amount" even though they mean different things.
  • Embedding model must match: query and documents must use the same embedding model. Mixing models produces meaningless similarity scores.

Gotchas

  • Always use the same embedding model for indexing and querying - mixing models gives garbage results
  • Embedding quality degrades for very short texts (1-2 words) and very long texts (beyond model's max input)
  • Multilingual embeddings exist but cross-lingual similarity is weaker than same-language
  • Embedding API calls add latency and cost to every query - consider caching for repeated queries
  • Dimension reduction (e.g., text-embedding-3-large with fewer dimensions) trades quality for speed/cost

See Also