ArcLibrary

Emergent Abilities

Sudden 'aha' capabilities that appear past a scale threshold — the most visible 'quantitative-to-qualitative' phenomenon.

EmergenceScaling
核心 · Key Idea

In one line: Emergent Abilities = when parameters / data / compute cross a critical threshold, the model's ability on certain tasks jumps from near-zero to working. Not smooth growth — a discontinuity. The most-discussed phenomenon beyond Scaling Law.

What it is#

Classic examples: "spell a word backwards", "3-digit addition", "Chain-of-Thought reasoning".

Model size    Pass rate
-----------------------
1B            0%
10B           1%
70B           15%   ← critical point
175B+         70%+

Not a smooth climb from 0% to 70% — at a certain scale the model "gets it". The GPT-3 paper first systematically recorded this jump.

Analogy#

打个比方 · Analogy

Like boiling water — at 99 °C it's still liquid, but a tiny bit more energy and it's steam — completely different state. Past a threshold a model "understands" a kind of structure — and smaller models can't be coaxed to.

Key concepts#

Scaling LawScaling law
Loss decays as a power law in parameters / data / compute — but 'capability' is not always smooth.
Capability ThresholdCapability threshold
The 'jump point' for a task — varies per task.
GrokkingGrokking
After enough training steps, accuracy on some tasks suddenly surges. A related phenomenon.
Mirage Critique'Emergence is an illusion' critique
Schaeffer et al.: with continuous metrics the 'jumps' smooth out. Emergence depends partly on binary scoring.

Typical emergent capabilities#

The further right, the more these stabilise only on big models. That's why if a small model can't do something, try a bigger model before tuning the prompt.

Practical notes (application view)#

  • Try a larger model first. Spending a week tuning prompts on 1B is less effective than 10 minutes with GPT-4 / Claude / DeepSeek-V3.
  • CoT barely works on small models. "Think step by step" is mostly noise on <7B — either go bigger or distil + SFT.
  • Don't conclude "this model is no good" too quickly. Often the prompt isn't quite there + scale is right at threshold. Combine: better prompt + bigger model + examples.
  • The "emergence is illusion" debate. In practice we care whether the task works, not the continuity of the metric — the concept helps you not blindly hope a small model gets there.

Easy confusions#

Emergent Abilities
**Discrete jump**: can / can't.
Scaling Law
**Smooth decline** of loss / perplexity.
They describe different dimensions.

Further reading#

  • LLM — relationship of size and capability
  • CoT — a canonical emergent ability
  • Pre-training — the source of emergence: training scale