Bootloader and partition changes can render a system unbootable. Keep a live USB handy and verify changes before rebooting production machines.
What you will achieve
Recognise kernel panics, capture evidence from journald and dmesg, and apply common fixes — bad modules, full disks, failing RAM.
1) What it looks like
Frozen console, stack trace on screen, caps lock blinking — system completely dead except magic sysrq.
2) Capture logs after reboot
journalctl -k -b -1 --no-pager | tail -100
sudo dmesg -T | grep -i panic
3) Common causes
- Recent kernel upgrade — boot previous entry from GRUB Advanced options.
- Bad DKMS module (NVIDIA, VirtualBox) — remove module or use LTS kernel.
- Filesystem corruption — boot live USB, run fsck.
- RAM — run memtest86+ from GRUB.
4) Blacklist module temporarily
echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
sudo update-initramfs -u
Verify
Stable uptime after fix: uptime and no new panic lines in journalctl -k.
5) kdump for post-mortem
sudo apt install linux-crashdump
# captures vmcore on next panic
6) Kernel parameters to test
intel_idle.max_cstate=1
nouveau.modeset=0
iommu=soft
Add one at a time at GRUB — binary search which option stabilises boot.
7) Hardware checklist
- Reseat RAM and power cables.
- Test with minimal USB peripherals.
- Check CPU thermals —
sensorsafter successful boot.
When to escalate
Repeating panics on stock kernel with memtest clean point to failing SSD or motherboard — software fixes hit diminishing returns.
8) mce (machine check) errors
grep mce /var/log/kern.log
Hardware CPU/RAM errors precede panics — replace hardware when MCE logs accumulate.
Prerequisites
Console or serial log capture. Previous kernel in GRUB. Live USB for fsck. memtest86+ on USB optional. Recent hardware/software change timeline (driver, RAM, kernel update).
Preserve panic log
journalctl -k -b -1 --no-pager > ~/last-boot-kernel.log
sysctl panic settings
kernel.panic=10
kernel.panic_on_oops=1Auto-reboot after panic on headless servers — paired with kdump for analysis.
Silent boot hide panic
Remove quiet splash from GRUB_CMDLINE_LINUX_DEFAULT to see full panic oops on screen — photograph or log for vendor bug report with exact call trace symbol names.
kexec rapid reboot
kexec loads new kernel without full firmware init — faster recovery loop when testing kernel params but does not fix hardware panics.
vendor support bundle
Before RMA RAM or SSD collect sosreport Ubuntu or redhat-support-tool on RHEL — vendor wants full hardware inventory attached to panic ticket.
netconsole remote panic log
Configure netconsole to send panic oops to syslog server on another host — captures oops when local disk too corrupt to write journal after reboot.