Artificial Intelligence
Internals, engineering practices, and runtime tuning. 18 topics across 5 chapters.
Foundations
5 topicsTransformer & Attention
The architecture behind modern LLMs — 'attention' lets the model see how every token in the context relates to every other.
Emergent Abilities
Sudden 'aha' capabilities that appear past a scale threshold — the most visible 'quantitative-to-qualitative' phenomenon.
MoE (Mixture of Experts)
Scale up parameters with 'sparse activation' so cost stays sane — the secret behind DeepSeek / Mixtral / GPT-4.
Attention Variants (MQA / GQA / FlashAttention)
Inference is bottlenecked by memory bandwidth, not compute — these variants pull it back.
KV Cache (the inference performance bottleneck)
Why long contexts get slower and pricier — the KV cache keeps growing.
Advanced Reasoning
2 topicsTraining & Fine-tuning
6 topicsPre-training
Train from scratch on massive unlabeled data so the model learns language — the source of everything an LLM can do.
SFT (Supervised Fine-Tuning)
The most direct way to teach a model a specific task — a small set of high-quality input/expected-output pairs.
RLHF (Reinforcement Learning from Human Feedback)
Align the model with human preferences — make it not only able to answer, but answer in a way humans want.
LoRA (Low-Rank Adaptation)
Fine-tune giant models with tiny parameter footprints — even a single consumer GPU can produce a custom model.
Knowledge Distillation
Teach a small model to mimic a big model's output distribution — squeeze 70B-class capability into 7B.
DPO (Direct Preference Optimization)
A simpler RLHF alternative — no reward model, no PPO, still aligns to human preference.