核心 · Key Idea
In one line: An embedding turns a piece of text into a string of numbers (a high-dimensional vector). Texts with similar meaning have similar vectors. This is the first time a computer can compare meaning — and it is the foundation of RAG, recommendation, clustering, and deduplication.
What it is#
"a puppy runs on the grass" → [0.12, -0.83, 0.41, ..., 0.07] (1536-dim)
"a young dog sprints on a lawn" → [0.13, -0.81, 0.40, ..., 0.08] ← almost identical
"the stock market crashed today"→ [-0.55, 0.22, -0.71, ..., 0.39] ← totally different
"Almost identical" shows up as a short distance in vector space (cosine similarity close to 1). Computing cosine similarity between two sentences tells you whether they are talking about the same thing.
Analogy#
打个比方 · Analogy
- Keyword search = literal match — "dog" and "canine" are different words.
- Embedding search = meaning match — every passage is translated into "a coordinate of meaning", then we measure who is closest.
Key concepts#
VectorVector
A list of floats. Common sizes: 384 / 768 / 1536 / 3072. Higher dimensions are more expressive and more expensive.
Cosine SimilarityCosine similarity
Compares the direction of two vectors. Range −1..1, 1 means identical.
Embedding ModelEmbedding model
A model dedicated to turning text into vectors. OpenAI text-embedding-3, BGE, E5, Cohere, etc.
DimensionalityDimensionality
Vector length. Matryoshka embeddings can be truncated and still work.
How it works#
The model projects text into a thousand-dimensional semantic space. Position encodes meaning.
Practical notes#
- Pick the right model > tune knobs. For Chinese, BGE / M3E / OpenAI 3-small are all solid — benchmark on your own corpus first.
- One model per project. Indexing with model A and querying with model B → completely broken. Changing models means rebuilding the index.
- Token budget. Embeddings are billed per token; a million tokens / tens of thousands of docs costs cents to a couple of dollars — two orders of magnitude cheaper than LLM calls.
- Batch calls. Most APIs accept 100+ texts per request — dozens of times faster than calling one at a time.
- Truncate aggressively. A 3072-d vector trimmed to 512-d (Matryoshka-trained models) costs 6× less storage with negligible recall loss.
Easy confusions#
Embedding
**Numeric vectors** — for computers to compare similarity.
LLM output
**Natural-language text** — for humans to read.
Two completely different output types.
Two completely different output types.
Embedding search
Understands meaning, works across languages.
Recalls "different wording, same meaning".
Recalls "different wording, same meaning".
BM25 keyword search
Strong on exact strings / named entities.
Recalls "same name, same form".
Recalls "same name, same form".
The best practice is to combine both: hybrid retrieval.
Further reading#
- RAG — embeddings' biggest application
- Vector Database — infrastructure for storing many embeddings
- Chunking — the splitting strategy that runs before embedding