In one line: A Token is the LLM's smallest "building block" of text — not a character, not a word, but a fragment of characters pre-baked into the model's vocabulary. Reading, writing, billing, and length limits all happen in Tokens.
What it is#
An LLM does not consume human text directly. It first splits the text into a sequence of Tokens, then maps each Token to an integer ID. Common splitters (BPE, SentencePiece) keep frequent character strings whole and slice rare ones into pieces, so:
- English: 1 Token ≈ 0.75 words
- Chinese: 1 Token ≈ 1–2 characters
- Code / emoji / rare characters: a single character may become multiple Tokens
Analogy#
Your brain reads "today's weather is great" character by character. The model reads it as "today / 's / weather / is / great" — each chunk is a Token.
Key concepts#
How it works#
The splitter is fixed before training — every user shares the same vocabulary.
Practical notes#
- Estimating Token count: for English, "words × 1.3"; for Chinese, "characters × 1.5". Good enough to ballpark API cost.
- Saving Tokens: trim filler in prompts; compress tabular data into JSON/CSV; summarise long documents before feeding them in.
- Watch out for rare characters: emoji, traditional-Chinese rare glyphs, obscure code symbols often each become several Tokens — lengths blow up easily.
- Tokenizers are not interchangeable: GPT-4's splitter differs from Claude / Qwen / Llama. The same paragraph can vary by 30% in Token count.
Easy confusions#
Could be half a word, a whole word, or several words.
Has **no one-to-one mapping** to the model's vocabulary.
OpenAI's tokenizer page lets you paste text and see the split. In Python, tiktoken computes it in one line.
Further reading#
- Context Window — how many Tokens you can fit at once
- Parameters — model size vs Token throughput
- Chunking — slicing long documents into Token-friendly pieces for RAG