How Embeddings Work — The Visual Guide to Semantic Search

No linear algebra required. Just pictures.

Embeddings are the secret behind every AI search system, every RAG pipeline, and every recommendation engine. But most explanations jump straight into cosine similarity formulas and lose everyone. This guide shows you what embeddings actually look like and how they work — visually.

1. The Embedding Space: Where Words Live

Imagine a map where every word has a position. Words with similar meanings are close together. “Cat” and “kitten” are neighbors. “Cat” and “JavaScript” are on opposite ends. That’s the embedding space.

The Embedding Space — Where Words Become Coordinates

Every word becomes a point. Similar meanings cluster together.

Watch the query dot orbit through the space. When it’s near the animal cluster, animal-related documents score high. When it drifts to tech, tech docs score high. Proximity equals relevance.

2. From Text to Numbers: The Embedding Process

A sentence goes into an embedding model. A list of 1,536 numbers comes out. Those numbers aren’t random — each dimension captures a sliver of meaning. Together, they’re a fingerprint of the sentence.

From Text to Numbers — How Embedding Works

A sentence goes in. A list of numbers comes out. Those numbers capture meaning.

Input Text

"The cat sat on the mat"

Tokenization

The cat sat on the mat

Embedding Model

text-embedding-3-small 1,536 dimensions

Vector Output

[0]

0.0234

[1]

-0.0891

[2]

0.1472

[3]

-0.0156

... 1,532 more dimensions

[1535]

0.0672

Each number captures a different aspect of meaning — tone, topic, syntax, entities. You can't read a single dimension, but **the whole vector is a fingerprint of the sentence's meaning**.

You can’t interpret a single dimension (“dimension 847 means food sense”), but the model can compare two fingerprints and tell you how similar the meanings are. That’s the whole trick.

3. How Similarity Search Works

Once everything is embedded, search is just finding the nearest neighbors. Your query becomes a vector, and the database returns the closest matches. High similarity score means high relevance.

How Similarity Search Works

Your query becomes a vector. The DB finds the closest neighbors. That's it.

Query

"How do I reset my password?" → [0.12, -0.08, 0.34, ...]

Results (sorted by similarity)

0.96 "Password reset instructions for all accounts" help-center/auth.md

0.89 "Forgot password? Here's how to recover access" faq/account-recovery.md

0.74 "Account security settings and 2FA setup" help-center/security.md

0.31 "Billing FAQ and payment methods" faq/billing.md

Similarity threshold: 0.70

The threshold is your quality knob. Set it too low and you get noisy results. Too high and you miss good answers. Start at 0.70, adjust based on your data.

4. Picking the Right Embedding Model

Not all embedding models are equal. Bigger models give better quality but cost more. Open-source models are free but less accurate. The right choice depends on your volume, budget, and accuracy needs.

Picking the Right Embedding Model

Not all embeddings are equal. Click each to see the trade-offs.

Fast

text-embedding-3-small 1,536 dims · $0.02 / 1M tokens

Quality

▼

Best for: prototyping, internal tools, high-volume search. Good enough for 80% of use cases. Lowest cost, fastest inference.

Latency: ~15ms MTEB Score: 62.3 Max tokens: 8,191

Balanced

text-embedding-3-large 3,072 dims · $0.13 / 1M tokens

Quality

▼

Best for: production RAG, customer-facing search, multilingual content. Significant quality jump over small. Worth 6x cost for precision-critical apps.

Latency: ~25ms MTEB Score: 64.6 Max tokens: 8,191

Open

all-MiniLM-L6-v2 384 dims · Free (self-hosted)

Quality

▼

Best for: air-gapped environments, budget-zero projects, edge deployments. Run on your own GPU. Lower quality but zero API costs and full data privacy.

Latency: ~5ms (local GPU) MTEB Score: 56.3 Max tokens: 256

Best

Cohere embed-v3 1,024 dims · $0.10 / 1M tokens

Quality

▼

Best for: highest quality search, multilingual, classification tasks. Currently top of MTEB leaderboard. Supports search and classification modes for optimized embeddings.

Latency: ~20ms MTEB Score: 66.3 Max tokens: 512

Rule of thumb: start with text-embedding-3-small. If your search quality isn’t good enough, upgrade to large. If you need zero API dependency, go open-source. Don’t start with the biggest model — you probably don’t need it.

5. How to Know Your Embeddings Are Working

You shipped a search feature. Users are querying it. But how do you know it’s actually returning good results? These three metrics tell you everything you need to know.

How to Know Your Embeddings Are Working

Three metrics that actually matter. Ignore everything else.

Recall@K

Of the relevant documents that exist, how many ended up in your top-K results?

Bad <60% OK 60-80% Good >80%

87% at K=5

MRR (Mean Reciprocal Rank)

Is the best answer in position 1, 2, or buried at position 10? Higher MRR = better ranking.

Bad <50% OK 50-75% Good >75%

0.79 best result avg at position 1.3

Semantic Drift

Are your embeddings still accurate as your document set grows? Drift means old embeddings no longer match new ones.

Good <5% OK 5-15% Bad >15%

12% re-embed quarterly to prevent

The most important one is Recall@K — if the right document exists in your database but doesn’t show up in the top 5 results, your embeddings aren’t doing their job. Fix chunking first, then consider a better model.