ArcLibrary

Vector Database

Specialised infrastructure for large-scale similarity search over high-dimensional vectors.

RAGVectorDB
核心 · Key Idea

In one line: A vector database is a DB purpose-built to store vectors and query nearest neighbours fast. The core problem it solves: out of millions to billions of vectors, find the K most similar to a query vector — in milliseconds.

What it is#

A regular SQL database can't do this — "find 5 with highest cosine similarity over 1536-d vectors" requires a full scan; at a million rows it falls over. Vector DBs use ANN indexes (HNSW / IVF / DiskANN) to bring it down to O(log N):

-- pgvector
SELECT id, content
FROM docs
ORDER BY embedding <=> $query_vector
LIMIT 5;

<=> is cosine distance — accelerated by the index, returning in milliseconds.

Analogy#

打个比方 · Analogy

A regular DB is the phone book — find a record by name (key).
A vector DB is map + radar — give a coordinate; the radar sweeps for the nearest points.

Key concepts#

ANN IndexANN index
HNSW / IVF / DiskANN — trade a sliver of accuracy for huge speed.
MetricDistance metric
Cosine / Euclidean / dot product — must match what the embedding model was trained for.
FilterMetadata filter
Filter by fields (user_id / time / category) alongside vector search.
Hybrid SearchHybrid search
Combine vector + keyword (BM25) and rerank.

Mainstream options#

OptionNotesBest for
pgvectorPostgres extension, zero adoption costUp to ~tens of millions, when PG is already in stack
Qdrant / Milvus / WeaviateDedicated, strong filters + hybridBillions / production
PineconeFully managed SaaSDon't want to operate it yourself
FAISS / ChromaSingle-machine libs, local experimentsPrototypes / offline
OpenSearch / ElasticsearchBM25 + vectors togetherExisting ES cluster

How it works#

The index stores high-dimensional space as partitions + neighbour graphs — queries only scan local regions.

Practical notes#

  • Try pgvector first. From thousands to tens of millions, pgvector is enough. Don't reach for a Milvus cluster on day one.
  • Metric must align. If the embedding model trains under cosine, query under cosine. Mismatch tanks recall.
  • Store metadata with vectors. Bind each vector to {doc_id, chunk_idx, user_id, source, time} and filter at query time.
  • Batch inserts. Inserting 1 vs 1000 at a time can differ by 30–100× in index-build speed.
  • Monitor recall. Run a "golden question set" regularly — immediately verify after upgrading embeddings or chunk size.

Easy confusions#

Vector DB
**Find by semantics.**
Understands synonyms and fuzzy meanings.
Search engine (ES)
**Find by literal match** (BM25).
Strong on exact entities.

Hybrid retrieval = both → the most reliable RAG recall.

Further reading#

  • Embeddings — where the vectors come from
  • RAG — the vector DB's biggest application
  • Chunking — pre-processing before writing to the vector DB