BTRFS Read-Only Filesystem Diagnostics

Context

System went read-only during heavy usage (4 concurrent Claude Code sessions). After reboot, system recovered normally. This reference captures the diagnostic workflow.

Quick Assessment

System State

# Current uptime and load
uptime

# Disk layout and state
lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,STATE,MODEL

# Disk space
df -h

# Memory status
free -h

BTRFS Error Counters (Requires sudo)

# Check all BTRFS volumes for errors
sudo btrfs device stats /
sudo btrfs device stats /home
sudo btrfs device stats /var/lib/libvirt/images

# Filesystem usage details
sudo btrfs filesystem usage /

Journal Analysis

Current Boot

# Error-level messages since boot
journalctl -b -p err..emerg --no-pager | head -50

# BTRFS-specific messages
journalctl -b | grep -iE 'btrfs|nvme|disk|i/o' | tail -30

Previous Boot (Pre-Crash)

# List recent boots
journalctl --list-boots | head -10

# Previous boot errors
journalctl -b -1 -p err..emerg --no-pager | head -50

# Search for read-only events
journalctl -b -1 | grep -iE 'read.only|remount|btrfs.*error|corrupt' | tail -30

# Deep search for IO/disk errors
journalctl -b -1 | awk '/btrfs.*error|BTRFS.*read-only|write_io_error|blk_update_request|I\/O error|EIO/i' | tail -30

# Last events before crash
journalctl -b -1 --no-pager | tail -100

Kernel Messages (Requires sudo)

# Current kernel buffer
sudo dmesg -T | grep -iE '(btrfs|error|fail|readonly|read-only|corrupt|i/o|nvme|remount)' | tail -50

# Timestamped kernel messages
sudo dmesg -T | tail -100

Storage Health

SMART Data (Requires sudo)

# NVMe health check
sudo smartctl -H /dev/nvme0n1
sudo smartctl -H /dev/nvme1n1

# Full SMART info
sudo smartctl -a /dev/nvme0n1 | head -50

NVMe Specific

# NVMe controller info
sudo nvme list
sudo nvme smart-log /dev/nvme0n1

Thermal and Fan Analysis

# CPU temperatures
sensors | grep -E '(Core|temp|fan|RPM)'

# GPU status (NVIDIA)
nvidia-smi --query-gpu=temperature.gpu,utilization.gpu,power.draw --format=csv,noheader

# CPU frequency and governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
awk '{print $1/1000000 " GHz"}' /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq

Process Analysis

# Top CPU consumers
ps aux --sort=-%cpu | awk 'NR<=15 {printf "%-12s %5s %5s %-s\n", $1, $3, $4, $11" "$12" "$13}'

# Check for heavy processes
pgrep -a -f 'claude|ollama|docker|node|python' | head -20

# Load average
cat /proc/loadavg

Common Causes

Cause Symptoms Resolution

BTRFS metadata exhaustion

Sudden read-only, no space errors despite free space

btrfs balance start -dusage=50 /

Checksum mismatch

Kernel logs show corruption detected

btrfs scrub start / then check results

Heavy concurrent writes

Multiple processes writing simultaneously

Reduce parallelism, add swap

Thermal throttling

High temps, fans spinning before crash

Clean dust, check thermal paste

Recovery Commands

After Clean Boot

# Verify filesystem mounted RW
mount | grep btrfs

# Run scrub to verify integrity
sudo btrfs scrub start /
sudo btrfs scrub start /home

# Check scrub status
sudo btrfs scrub status /

If Still Read-Only

# Try remount
sudo mount -o remount,rw /

# If that fails, boot from live USB and run
btrfs check /dev/mapper/cryptroot

Immediate Quieting (Post-Boot)

# Stop background services if not needed
sudo systemctl stop ollama
docker stop principia-postgres

# System should settle in 10-15 minutes after BTRFS recovery

Findings (2026-02-21 Incident)

  • Previous boot journal ends abruptly at 20:02 - journald couldn’t write once FS went read-only

  • /boot/efi (FAT32) shows "not properly unmounted" warning - consequence, not cause

  • No OOM, no I/O errors captured in logs

  • BTRFS btrfs-endio workers active post-reboot - normal recovery behavior

  • Core 0 at 76C after reboot with minimal load - BTRFS verification overhead

Likely cause: Heavy concurrent writes from 4 Claude sessions triggered BTRFS protective read-only mode. No actual corruption detected.