In one line: Short-term memory = the most recent few turns of the current conversation. The model itself has no concept of "session" — every turn we splice the whole chat history back into the prompt, making the model "think" it remembers. Once total length exceeds the context window, you must decide what to keep and what to drop.
What it is#
Every LLM call really sends:
[
{"role": "system", "content": "You are an assistant"},
{"role": "user", "content": "My name is Mike"},
{"role": "assistant", "content": "Hello Mike"},
{"role": "user", "content": "What's my name?"} ← current question
]The model "remembers the name" purely because earlier messages are still in the prompt. Once the total length exceeds the window, older messages get cut and the model "forgets".
Analogy#
The model's "brain" = a finite whiteboard. Every turn we copy old notes back onto it, then add the current question.
When the board fills up, we have to erase old stuff — short-term memory strategy is "which bits to erase."
Key concepts#
How it works#
In production almost everyone uses Hybrid: pin key facts + summarise the middle + keep the last 5–10 turns.
Practical notes#
- Track total tokens. Context window is not free — system + history + tools + output reserve all count.
- Give summaries a format. Have the model "emit 3 lines of markdown bullets capturing key facts" — far more stable than "summarise freely."
- Extract important facts immediately. When the user says "I'm allergic to peanuts", extract it on the spot into the system prompt — don't rely on future summarisation.
- Don't store raw tool output. Multi-KB JSON instantly devours context. Summarise before feeding back.
- For long tasks, use LangGraph or a checkpointed lib. Hand-rolling history splicing is a bug magnet.
Easy confusions#
Just messages in the prompt.
Stored in DB / vector store; pulled back next session.
Further reading#
- Context Window — short-term memory's hard cap
- Long-term Memory — across-session persistence
- RAG — when history is too long, retrieve old turns instead