What you will achieve

Live server monitoring with htop for processes and iostat for disk throughput — quick triage before Prometheus/Grafana exists.

1) Install tools

sudo apt install htop sysstat
# Fedora: sudo dnf install htop sysstat

2) htop usage

htop

F6 sort by CPU/MEM, F4 filter, F9 kill (careful). Press H to show threads.

3) iostat baseline

iostat -xz 1 5

Watch %util near 100% — disk saturated. await high — latency suffering.

4) Combine with history

sar -u 1 5
sar -d 1 5

Verify

Correlate high load with specific PIDs in htop and block devices in iostat — same time window.

5) htop batch mode for logs

htop -b -d 5 -n 3 > /tmp/htop-snapshot.txt

6) iotop permissions

sudo iotop -ao

Shows only processes doing I/O — quickly finds log spammer or runaway database.

7) Persistent sysstat history

sudo sed -i 's/ENABLED="false"/ENABLED="true"/' /etc/default/sysstat
sudo systemctl enable --now sysstat

sar -q  # load average history

Baseline before incident

Capture normal iostat and htop during peak hours — without baseline you cannot tell if 40% disk util is normal or crisis.

8) atop for historical replay

sudo apt install atop
sudo systemctl enable --now atop

Prerequisites

SSH shell access. htop and sysstat packages. Baseline metrics documented. Optional: regular sar collection enabled. Know server role (web, DB, batch) to interpret CPU vs IO bottlenecks.

tmux for long captures

tmux new -s watch
htop

Detach and reconnect during long incidents — output survives SSH drop.

glances alternative

sudo apt install glances
 glances

Single TUI combining CPU, disk, net — quicker overview than switching htop and iostat manually.

Recording during incident

script -c 'iostat -xz 1 300' /tmp/iostat-incident.log

Capture five minutes for post-mortem — attach to ticket with htop screenshot equivalent text log.

node_exporter migration path

Manual htop/iostat triage graduates to Prometheus node_exporter — same metrics automated. Until then sar history on sysstat enabled hosts gives post-incident graphs.