What you will achieve

Interpret load average and track down CPU, I/O, or lock contention causing sluggish servers — without rebooting blindly.

1) Read load in context

uptime
nproc
top -b -n1 | head -20

Load of 8 on a 4-core box means contention; on 16 cores it may be fine.

2) CPU vs I/O wait

sudo apt install sysstat
iostat -xz 1 5
pidstat -d 1 5

High %wa in iostat → disk bottleneck. High CPU → check ps aux --sort=-%cpu | head.

3) Specific offenders

systemctl list-units --state=running
sudo iotop -o

4) Mitigate

Restart runaway service after identifying root cause in logs.
Add swap or fix memory leak if OOM killer is thrashing.
Schedule heavy cron jobs off-peak.

Verify

uptime
iostat -c 1 3

5) Zombie processes

ps aux | awk '$8 ~ /Z/ {print}'

Zombies indicate parent not reaping children — restart parent service, not zombies themselves.

6) Memory pressure

free -h
vmstat 1 5

High swap churn with low free RAM — add memory or fix leak. oom_score_adj protects critical daemons.

7) IRQ saturation

cat /proc/interrupts
mpstat -I SUM 1 3

10 Gbit NIC on single queue can spike softirq — consider RPS/RFS or better NIC drivers.

When load is acceptable

Batch jobs intentionally peg CPU — load 32 on 32 cores during ffmpeg transcode is fine. Context matters more than absolute numbers.

8) Transparent huge pages databases

PostgreSQL and MongoDB docs often recommend disabling THP — check vendor tuning guides if DB is the CPU hog under load.

Prerequisites

SSH access during slowness, sysstat and htop installed, baseline knowledge of normal load for this host role. Change window if restart required.

Save evidence before kill -9

ps aux --sort=-%cpu | head -20 > /tmp/top-cpu.txt
sudo perf top -d 5

Supports post-mortem after killing runaway process.

Blocked processes D state

ps aux | awk '$8=="D"'

Uninterruptible sleep usually I/O wait on NFS or dying disk — killing does not work, fix storage.

cgroup v2 pressure

cat /proc/pressure/cpu
 cat /proc/pressure/io

PSI metrics on systemd 250+ hosts quantify resource pressure better than load average alone — integrate with monitoring before load spikes become outages. Kubernetes nodes showing high load may be kubelet or eviction pressure — check kubectl top nodes separately from host uptime.