← Back to Home

Vector Databases Explained — The Engine Behind Semantic Search and RAG

Visual guide to vector databases. Understand how vector search works, compare Pinecone vs pgvector vs Weaviate, and learn ANN indexing strategies for production RAG systems.

Regular databases find rows where column = value. Vector databases find rows where meaning ≈ meaning. That’s the fundamental shift. You’re not querying by exact match — you’re querying by semantic similarity. “Find me documents that mean something similar to this question.”

This is what powers RAG (Retrieval Augmented Generation), semantic search, recommendation engines, and image similarity. Understanding vector databases unlocks understanding modern AI applications.

1. How Vector Search Works

Traditional search: “machine learning healthcare” → finds documents containing those exact words. Vector search: “How is AI used in medicine?” → finds documents about ML in healthcare, even if they never mention “AI” or “medicine.” The search operates on meaning, not keywords.

How Vector Search Works

1
Document → Embedding Model"Machine learning is transforming healthcare" → [0.23, -0.87, 0.15, ..., 0.42] (1536 dimensions)
2
Store in Vector DBVector indexed with ANN algorithm (HNSW, IVF) for fast nearest-neighbor search
3
Query → Same Embedding Model"How is AI used in medicine?" → [0.19, -0.82, 0.21, ..., 0.38]
4
Cosine Similarity SearchFind vectors closest to the query vector. The healthcare doc scores 0.94 similarity — high match despite zero word overlap.
Key insight: Vector search finds semantic similarity, not keyword matches. "AI in medicine" finds "machine learning in healthcare" because their meanings are close in embedding space — even though they share no words.

The embedding model is the bridge between text and math. It converts human-readable text into a point in high-dimensional space where semantically similar texts are close together. The vector database stores millions of these points and finds the nearest ones to your query in milliseconds.

2. Picking a Database

The vector database landscape is crowded. Purpose-built options (Pinecone, Weaviate, Qdrant), Postgres extensions (pgvector), and embedded options (ChromaDB). The right choice depends on your scale, ops capability, and existing infrastructure.

Vector Database Comparison — 2026

DatabaseTypeScaleBest For
PineconeManaged SaaSBillionsZero-ops, fast start
WeaviateOSS / CloudBillionsHybrid search, multi-modal
QdrantOSS / CloudBillionsRust performance, filtering
pgvectorPG extensionMillionsAlready using Postgres
ChromaDBOSS embeddedThousandsPrototyping, local dev
Start with: pgvector if you already run Postgres and have < 5M vectors. Pinecone if you want zero ops. Qdrant or Weaviate if you need scale + self-hosting.

The question I get most: “Should I use a purpose-built vector DB or just pgvector?” If you have under 5 million vectors and already run Postgres — pgvector. It’s good enough, it’s familiar, and it doesn’t add another service to manage. Beyond 5M vectors, or if you need advanced filtering + vector search combined, purpose-built databases start to pull ahead in performance.

3. Indexing — Speed vs Accuracy

Vector search is fundamentally a nearest-neighbor problem: find the K closest vectors to the query vector. Exact search is O(n) — unusable at scale. Approximate Nearest Neighbor (ANN) algorithms make it fast, trading a tiny accuracy loss for 1000x speed improvement.

Index Types — Speed vs Accuracy Tradeoff

Flat (Brute Force)Exact nearest neighbor. Compares query against every vector. O(n) per query. Perfect accuracy. Unusable above 100K vectors.
Speed
Accuracy
IVF (Inverted File Index)Clusters vectors into buckets. Only searches nearby clusters. Fast but misses edge cases. Good default for millions of vectors.
Speed
Accuracy
HNSW (Hierarchical Navigable Small World)Graph-based. Navigates a multi-layer graph to find neighbors. Best speed-accuracy tradeoff. Memory-hungry. Industry standard for production.
Speed
Accuracy

HNSW is the index type used by almost every production vector database in 2026. It builds a navigable graph where each node connects to nearby nodes across multiple layers. Search starts at the top layer (sparse, long jumps) and refines at lower layers (dense, short jumps). The search time is O(log n) regardless of dataset size — you can search billions of vectors in single-digit milliseconds.