BTRFS Read-Only Filesystem Diagnostics
Context
System went read-only during heavy usage (4 concurrent Claude Code sessions). After reboot, system recovered normally. This reference captures the diagnostic workflow.
Quick Assessment
System State
# Current uptime and load
uptime
# Disk layout and state
lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,STATE,MODEL
# Disk space
df -h
# Memory status
free -h
BTRFS Error Counters (Requires sudo)
# Check all BTRFS volumes for errors
sudo btrfs device stats /
sudo btrfs device stats /home
sudo btrfs device stats /var/lib/libvirt/images
# Filesystem usage details
sudo btrfs filesystem usage /
Journal Analysis
Current Boot
# Error-level messages since boot
journalctl -b -p err..emerg --no-pager | head -50
# BTRFS-specific messages
journalctl -b | grep -iE 'btrfs|nvme|disk|i/o' | tail -30
Previous Boot (Pre-Crash)
# List recent boots
journalctl --list-boots | head -10
# Previous boot errors
journalctl -b -1 -p err..emerg --no-pager | head -50
# Search for read-only events
journalctl -b -1 | grep -iE 'read.only|remount|btrfs.*error|corrupt' | tail -30
# Deep search for IO/disk errors
journalctl -b -1 | awk '/btrfs.*error|BTRFS.*read-only|write_io_error|blk_update_request|I\/O error|EIO/i' | tail -30
# Last events before crash
journalctl -b -1 --no-pager | tail -100
Kernel Messages (Requires sudo)
# Current kernel buffer
sudo dmesg -T | grep -iE '(btrfs|error|fail|readonly|read-only|corrupt|i/o|nvme|remount)' | tail -50
# Timestamped kernel messages
sudo dmesg -T | tail -100
Storage Health
SMART Data (Requires sudo)
# NVMe health check
sudo smartctl -H /dev/nvme0n1
sudo smartctl -H /dev/nvme1n1
# Full SMART info
sudo smartctl -a /dev/nvme0n1 | head -50
NVMe Specific
# NVMe controller info
sudo nvme list
sudo nvme smart-log /dev/nvme0n1
Thermal and Fan Analysis
# CPU temperatures
sensors | grep -E '(Core|temp|fan|RPM)'
# GPU status (NVIDIA)
nvidia-smi --query-gpu=temperature.gpu,utilization.gpu,power.draw --format=csv,noheader
# CPU frequency and governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
awk '{print $1/1000000 " GHz"}' /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
Process Analysis
# Top CPU consumers
ps aux --sort=-%cpu | awk 'NR<=15 {printf "%-12s %5s %5s %-s\n", $1, $3, $4, $11" "$12" "$13}'
# Check for heavy processes
pgrep -a -f 'claude|ollama|docker|node|python' | head -20
# Load average
cat /proc/loadavg
Common Causes
| Cause | Symptoms | Resolution |
|---|---|---|
BTRFS metadata exhaustion |
Sudden read-only, no space errors despite free space |
|
Checksum mismatch |
Kernel logs show corruption detected |
|
Heavy concurrent writes |
Multiple processes writing simultaneously |
Reduce parallelism, add swap |
Thermal throttling |
High temps, fans spinning before crash |
Clean dust, check thermal paste |
Recovery Commands
After Clean Boot
# Verify filesystem mounted RW
mount | grep btrfs
# Run scrub to verify integrity
sudo btrfs scrub start /
sudo btrfs scrub start /home
# Check scrub status
sudo btrfs scrub status /
If Still Read-Only
# Try remount
sudo mount -o remount,rw /
# If that fails, boot from live USB and run
btrfs check /dev/mapper/cryptroot
Immediate Quieting (Post-Boot)
# Stop background services if not needed
sudo systemctl stop ollama
docker stop principia-postgres
# System should settle in 10-15 minutes after BTRFS recovery
Findings (2026-02-21 Incident)
-
Previous boot journal ends abruptly at 20:02 - journald couldn’t write once FS went read-only
-
/boot/efi(FAT32) shows "not properly unmounted" warning - consequence, not cause -
No OOM, no I/O errors captured in logs
-
BTRFS
btrfs-endioworkers active post-reboot - normal recovery behavior -
Core 0 at 76C after reboot with minimal load - BTRFS verification overhead
Likely cause: Heavy concurrent writes from 4 Claude sessions triggered BTRFS protective read-only mode. No actual corruption detected.