In one line: SFT = Supervised Fine-Tuning: prepare a small set of {user input, expected answer} pairs and use the same "next-token prediction" objective to teach the model to answer in your format. It's the most direct way to make a base model "speak the way you want."
What it is#
The data looks like:
{"messages":[{"role":"user","content":"What is an LLM?"},
{"role":"assistant","content":"An LLM is a large language model..."}]}
{"messages":[{"role":"user","content":"Help me request a refund"},
{"role":"assistant","content":"Sure — please share your order number..."}]}The model is trained for a few more epochs on these dialogues; loss is computed only on the assistant's reply. A few thousand high-quality conversations are enough to turn a base model into "customer-support voice", "medical voice", "brand persona", etc.
Analogy#
Pre-training = read the whole library; learns language.
SFT = show it a stack of "exemplary essays" to imitate — it learns the pattern "this kind of question gets this kind of answer".
You're not making it know more — you're making it answer in your style.
Key concepts#
How it works#
Technically identical to pre-training (next-token prediction); only the data scale + loss mask differ.
Practical notes#
- Data quality is 90% of the work. A week polishing 1,000 examples beats a month assembling 100k noisy ones.
- Default to LoRA. Unless you have a big GPU cluster, full-parameter SFT is too expensive. LoRA fine-tunes in hours on a single GPU.
- Lower learning rate than pre-training. Start 1e-5 ~ 5e-5. Crank it up and you'll lobotomise the base model.
- Mix in general data to prevent forgetting. Add 1:1 generic dialogue or the model will only answer your task and break everything else.
- Try prompt + few-shot first. If a prompt fixes it, don't SFT. SFT toolchains are complex; prompts iterate by editing one line.
Easy confusions#
Updating means retraining.
Updating means editing the corpus.
Further reading#
- Pre-training — the step before SFT
- LoRA — SFT's low-cost implementation
- RLHF — SFT's next step: alignment