ArcLibrary

Linux Performance Tuning Basics

USE method + the four-axis (CPU / mem / disk / net) checklist — find bottlenecks without guessing.

PerformanceTuningUSE
核心 · Key Idea

In one line: always measure before you change. Brendan Gregg's USE method (Utilization / Saturation / Errors) plus the 60-second checklist finds 80 % of bottlenecks in 5 minutes.

60-second checklist#

uptime              # load 1/5/15 min; > #cores = backlog
dmesg | tail        # OOM? disk errors?
vmstat 1 5          # r col = CPU runq, si/so = swap, wa = IO wait
mpstat -P ALL 1     # per-CPU; single-core saturated = lock contention?
pidstat 1           # which process eats CPU
iostat -xz 1        # %util, await
free -m             # mem / cache / swap
sar -n DEV 1 5      # NIC throughput
sar -n TCP,ETCP 1 5 # TCP retrans / segs
top / htop          # overview

Analogy#

打个比方 · Analogy

Performance tuning is like a doctor's exam: take temperature, blood pressure, blood test (USE metrics) before prescribing pills (tweaking knobs).

USE method#

UtilizationUtilization
Percent of time the resource is busy. CPU 80 %, disk 70 %.
SaturationSaturation
Queue depth waiting for the resource. runq, IO wait, TCP listen overflow.
ErrorsErrors
Drops, IO errors, OOMs.

Apply each dimension to: CPU, memory, disk, network.

Quick reference#

CPU utilization
top, mpstat
CPU saturation
uptime (loadavg), vmstat r
CPU errors
perf stat (cache miss / branch miss)
Mem utilization
free, /proc/meminfo
Mem saturation
vmstat si/so (swap), dmesg OOM
Disk utilization
iostat -x %util
Disk saturation
iostat await, vmstat wa
Disk errors
dmesg, smartctl
Net utilization
sar -n DEV, nload
Net saturation
ss -ti (cwnd, rwnd), netstat -s retrans
Net errors
ip -s link, ethtool -S

Flame graphs + perf#

# Sample for 30s
sudo perf record -F 99 -ag -- sleep 30
sudo perf script | inferno-flamegraph > flame.svg

Flame graph = horizontal: sample share; vertical: call stack — see at a glance where CPU time goes.

How it works#

Always re-measure after changes — otherwise you may have fixed something unrelated.

Practical notes#

  • Two flavors of "slow": throughput slow / per-request latency. Identify which before picking tools.
  • Don't tune knobs first — check app / DB indexes / cache hit rate; most bottlenecks are in app code, not the kernel.
  • High CPU isn't always bad — batch jobs maxing CPU is good; long wait times are bad.
  • Network: watch retrans / TCP state distribution — ss -tan + netstat -s. Retrans > 1 % → check the link.
  • Swap cautiously — using swap on a server usually means a slide is starting; OOM-killing is more controllable.
  • eBPF tools: bcc / bpftrace are modern weapons (execsnoop / opensnoop / biosnoop / tcptop).
  • Baselines matter — collect normal metrics so when something goes wrong you can compare.

Easy confusions#

High load average
Includes processes **waiting for I/O**.
CPU might be idle but disk stuck.
High CPU utilization
CPU actually executing code.
Use perf / flame graphs to find hot spots.

Further reading#