INC-2026-04-04: Razer Blade 18 High Fan Noise & Elevated Temperature
Incident Summary
| Field | Value |
|---|---|
Detected |
2026-04-04 ~18:50 PDT (user report - sudden fan noise) |
Mitigated |
2026-04-04 ~19:15 PDT (killed runaway waybar process) |
Resolved |
2026-04-04 ~19:44 PDT (CSS fixes applied, waybar stable at 0% CPU) |
Duration |
~54 minutes (detection to resolution) |
Severity |
P3 (Medium) - System functional but thermally stressed |
Impact |
CPU at 86°C, fans at max RPM, CPU throttled to 800 MHz |
Root Cause |
Waybar CSS |
Timeline
| Time (PDT) | Event |
|---|---|
~18:50 |
User noticed sudden high fan noise after plugging in AC power |
18:51 |
Initial diagnostics: CPU at 86°C, GPU at 54°C idle (7W, 0% util). GPU ruled out. |
18:52 |
Process analysis: waybar PID 2140 at 96.1% CPU identified as heat source |
18:53 |
Killed waybar, temps began dropping (86°C → 79°C) |
18:54 |
Restarted waybar — returned to 52% CPU immediately. Not a one-off. |
19:00 |
CSS animations commented out (glow-pulse, text-flicker, subtle-breathe). Still 53% CPU. |
19:02 |
All remaining animations disabled. Still 52% CPU. Animations not sole cause. |
19:10 |
Minimal config test (clock only): 0.0% CPU. Confirmed modules are fine. |
19:15 |
Full modules + minimal CSS: 0.0% CPU. Confirmed CSS is the root cause. |
19:25 |
Removed media module — still 57%. playerctl -F ruled out. |
19:29 |
Disabled |
19:32 |
Disabled |
19:40 |
Read full |
19:44 |
Waybar stable at 0.0% CPU. No CSS parse errors. Temps: 56°C and dropping. |
19:46 |
Final thermal reading: CPU 56°C, ACPI 54°C. 30°C drop from peak. Incident resolved. |
Symptoms
-
Sudden onset of high fan noise when AC power plugged in
-
CPU package temperature at 86°C during light desktop use (terminal, browser)
-
CPU throttled to 800 MHz (min frequency) — still couldn’t cool down
-
GPU completely idle (54°C, 7W, 0% utilization) — not the cause
-
System uptime was 1 day 20 hours at time of incident
Investigation
Phase 1: Thermal Baseline
paste <(cat /sys/class/thermal/thermal_zone*/type) \
<(awk '{printf "%.1f°C\n", $1/1000}' /sys/class/thermal/thermal_zone*/temp)
| Sensor | Temperature |
|---|---|
x86_pkg_temp (CPU) |
86.0°C |
acpitz |
81.0°C |
SEN1/SEN2 (VRM/SSD) |
53-54°C |
NVIDIA RTX 5090 |
54°C (idle) |
iwlwifi |
60°C |
Phase 2: Process Analysis
top -b -n1 | awk 'NR>=7 && NR<=12'
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2140 evanusm+ 20 0 2391024 93284 77544 R 96.1 0.1 11,21 waybar
2084 evanusm+ 20 0 1570524 203848 113288 S 4.8 0.3 35:45.76 Hyprland
waybar at 96.1% CPU — single core pinned. 11 minutes 21 seconds of accumulated CPU time.
Phase 3: GPU Verification
nvidia-smi
| 0 NVIDIA GeForce RTX 5090 Off | 00000000:01:00.0 Off | N/A |
| N/A 54C P8 7W / 95W | 17MiB / 24463MiB | 0% Default |
Processes:
| 0 N/A N/A 2084 G Hyprland 6MiB |
Only Hyprland on GPU at 6 MiB. Firefox and kitty NOT on dGPU (unlike INC-2026-03-27).
Phase 4: Isolation Testing
Systematic elimination to identify root cause:
| Test | Config | CPU Result |
|---|---|---|
Minimal (clock only, minimal CSS) |
1 module, no styling |
0.0% |
Full modules, minimal CSS |
All modules, |
0.0% |
Full modules, full CSS (no animations) |
All modules, animations commented |
53% |
Full modules, full CSS (no transitions) |
All modules, transitions + animations off |
58% |
Full modules, full CSS (no box-shadow) |
All modules, box-shadow + animations + transitions off |
CSS parse error (broken comments) |
Full modules, properly cleaned CSS |
Removed box-shadow blocks, fixed syntax, disabled last animation |
0.0% |
Conclusion: Full modules + minimal CSS = 0%. The CSS itself was the sole cause.
Phase 5: CSS Forensics
The style.css (605 lines, Catppuccin Mocha / Glass Island theme) contained:
| Property | Count |
|---|---|
|
~80+ |
|
8 (including multi-line) |
|
~20 |
|
4 |
Infinite |
7 (pulse, glow-pulse, text-flicker, border-glow-cycle, subtle-breathe) |
|
3 |
|
2 |
Findings
-
Primary: Multi-line
box-shadowwithalpha()compositing forced GTK3 into software rendering path for every widget repaint -
Contributing: Seven infinite CSS animations triggered constant repaints every frame, each requiring expensive shadow recalculation
-
Trigger: The degradation was progressive over 1 day 20h uptime — GTK3’s rendering pipeline accumulated overhead until it entered a CPU spin
-
Correlation: Plugging in AC power likely triggered a battery state class change (
.charging), causing a style recalculation cascade on all widgets simultaneously
Root Cause
Technical explanation: GTK3’s CSS engine performs box-shadow rendering in software (CPU), not GPU-accelerated. The waybar theme used 8 box-shadow declarations combined with alpha() transparency compositing and 7 infinite CSS animations. Each animation frame triggered a repaint cycle that required recalculating shadows and alpha blending for every affected widget. Over ~44 hours of uptime, GTK3’s rendering pipeline degraded until the repaint cost exceeded one frame budget, causing a sustained CPU spin at 50-96%.
Why it happened:
-
Immediate cause: GTK3 CSS repaint loop driven by
box-shadow+alpha()compositing -
Contributing factors: 7 infinite CSS animations forcing constant repaints, ~44h uptime without waybar restart
-
Trigger: AC power plug-in caused battery class change (
.charging), cascading style recalculation -
Systemic issues: No awareness that GTK3 CSS
box-shadowis CPU-rendered, no performance testing of the theme
Resolution
CSS Changes Applied
Removed all box-shadow declarations (GTK3 renders these in software):
/* REMOVED - multi-line box-shadow from window#waybar */
/* REMOVED - box-shadow from tooltip */
/* REMOVED - box-shadow from #custom-power:hover */
/* REMOVED - box-shadow from @keyframes glow-pulse (replaced with opacity) */
/* REMOVED - box-shadow from @keyframes border-glow-cycle */
Disabled all infinite CSS animations:
/* DISABLED: glow-pulse, text-flicker, subtle-breathe, border-glow-cycle */
/* Conditional animations retained: pulse on .urgent and .critical (only fires on state) */
Disabled all transition: all properties:
/* DISABLED: transition: all 0.2s/0.3s ease on 4 selectors */
Verification
| Sensor | Before | After |
|---|---|---|
x86_pkg_temp |
86°C |
56°C |
acpitz |
81°C |
54°C |
SEN1/SEN2 |
53-54°C |
46-48°C |
waybar CPU |
96.1% |
0.0% |
CPU frequency |
800 MHz (throttled) |
Normal scaling |
-
Waybar at 0.0% CPU after 30 seconds
-
Temperatures within normal range (CPU 56°C idle)
-
Fan noise returned to baseline
-
No CSS parse errors in waybar log
-
All modules functional (workspaces, clock, media, battery, network, etc.)
Impact Assessment
Systems Affected
| System | Status | Impact Duration |
|---|---|---|
modestus-razer |
Resolved |
~54 minutes |
User comfort |
Restored |
Fan noise eliminated |
Business Impact
-
Users affected: 1 (personal workstation)
-
Data loss: No
-
Hardware risk: Sustained 86°C for unknown duration (possibly hours before detection)
-
Workaround: Killing waybar provided immediate relief
Prevention
Short-term (This Week)
-
Remove
box-shadowfrom waybar CSS - Evan -
Disable infinite CSS animations - Evan
-
Disable
transition: allproperties - Evan -
Create missing
~/.local/bin/waybar-*-info.shscripts (5 missing on-click handlers) - Evan -
Document normal thermal baseline (CPU 56°C idle, 85°C load) - Evan
Long-term (This Quarter)
-
Set up thermal monitoring script (alert if CPU >80°C sustained) - Evan
-
Add waybar restart to a daily systemd timer as a defensive measure - Evan
-
Performance-test any future CSS theme changes before committing - Evan
-
Evaluate whether
alpha()can be replaced with solid colors for further CPU savings - Evan
Lessons Learned
What Went Well
-
Systematic isolation methodology narrowed from "fans are loud" to exact CSS root cause in <1 hour
-
Minimal config test definitively separated modules from CSS
-
Previous incident (INC-2026-03-27) provided context for GPU vs CPU thermal analysis
-
GPU was quickly ruled out, preventing a repeat of the dGPU investigation
What Could Be Improved
-
Initial hypothesis (CSS animations) was partially correct but led to 15 minutes of sed-based comment edits that introduced CSS parse errors — should have read the file directly from the start
-
Multiple incorrect assumptions tested sequentially (animations, transitions, playerctl, box-shadow) when a single minimal-CSS test would have isolated the class of problem immediately
-
No performance baseline existed for waybar CPU usage
Key Takeaways
|
CLI Reference: Diagnostic Commands Used
This incident was diagnosed entirely from the terminal. The commands below are organized by technique, with explanations of the advanced patterns.
Process Substitution — <(command)
Process substitution creates a temporary file descriptor from a command’s output, allowing commands that expect filenames to read from pipelines instead.
# Merge two command outputs side-by-side with paste
# <(cmd1) creates fd for thermal zone names
# <(cmd2) creates fd for temperatures converted from millidegrees
paste <(cat /sys/class/thermal/thermal_zone*/type) \
<(awk '{printf "%.1f°C\n", $1/1000}' /sys/class/thermal/thermal_zone*/temp)
paste normally takes two files. Process substitution feeds it two "virtual files" — one with zone names, one with converted temperatures. The awk divides millidegree readings by 1000 and formats to one decimal place.
# Ephemeral waybar config via process substitution
# waybar -c expects a file path — <(echo '...') creates one on the fly
waybar -c <(echo '{"layer":"top","modules-center":["clock"],"clock":{"format":"%H:%M"}}') \
-s <(echo '* {font-size:14px; color:#cdd6f4; background:#1e1e2e;}') &disown 2>/dev/null
This runs waybar with an inline config and CSS without creating any temp files. The -c and -s flags both receive /proc/self/fd/NN paths pointing to the echo output. Critical for isolation testing — no files to clean up.
Command Substitution — $(command)
# Kill a process by name — $(pgrep) returns the PID inline
kill $(pgrep -x waybar)
# Use PID in ps — pgrep finds it, xargs passes it to ps
pgrep -x waybar | xargs ps -o pid,pcpu,etime --no-headers -p
Batch-Mode top with awk Filtering
# top -b = batch mode (non-interactive, stdout)
# -n1 = single snapshot
# awk filters to rows 7-12 (skips header, shows top processes)
top -b -n1 | awk 'NR>=7 && NR<=12'
# Filter top output for a specific process
top -b -n1 | awk '/waybar/ {print $1, $9, $10, $12}'
# $1=PID, $9=%CPU, $10=%MEM, $12=COMMAND
Thermal Zone Reading with awk Math
# Linux exposes temps in millidegrees (86000 = 86.0°C)
# awk converts inline with printf formatting
awk '{printf "%.1f°C\n", $1/1000}' /sys/class/thermal/thermal_zone*/temp
# CPU frequency in MHz (kernel reports in kHz)
awk '{printf "%.0f MHz\n", $1/1000}' /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq | sort -rn | head -5
# Power draw in watts (kernel reports in microwatts)
awk '{printf "%.1f W\n", $1/1000000}' /sys/class/power_supply/BAT*/power_now
nvidia-smi Queries
# Full dashboard view
nvidia-smi
# Structured CSV query — specific fields, no headers
nvidia-smi --query-gpu=temperature.gpu,power.draw,fan.speed,utilization.gpu \
--format=csv,noheader
# Per-process GPU monitoring (single snapshot)
nvidia-smi pmon -c 1
Process Hunting
# Top CPU consumers — command ps to bypass aliases
command ps aux --sort=-%cpu | awk 'NR<=15 {printf "%-8s %5s %5s %s\n", $1, $3, $4, $11}'
# Runaway processes (>20% CPU sustained)
command ps aux --sort=-%cpu | awk '$3>20'
# Check if a process is running
pgrep -x waybar && echo "running" || echo "not running"
sysfs Exploration
# CPU governor, max/current frequency
grep . /sys/devices/system/cpu/cpu0/cpufreq/scaling_{governor,max_freq,cur_freq}
# AC power status — awk ternary for human-readable output
cat /sys/class/power_supply/AC*/online | \
awk '{print ($1==1) ? "AC Connected" : "On Battery"}'
# Find fan RPM sensors across all hwmon devices
find /sys/class/hwmon -name 'fan*_input' -exec sh -c \
'echo "$(cat $(dirname {})/name): $(cat {}) RPM"' \;
CSS Forensics with grep
# Count expensive CSS properties
grep -cE 'box-shadow|backdrop-filter|blur|opacity|border-radius|transition|alpha\(' ~/.config/waybar/style.css
# Find specific properties with line numbers
grep -nE 'transition:' ~/.config/waybar/style.css
# Find uncommented animation properties (exclude CSS comments)
grep -n 'animation:' ~/.config/waybar/style.css | grep -v '/\*'
# Find unclosed CSS comments (opened but not closed on same line)
awk '/\/\*/ && !/\*\// {print NR": "$0}' ~/.config/waybar/style.css
sed for CSS Property Invalidation
# Prefix property name with underscore — GTK ignores unknown properties
# Safer than commenting (no nesting issues)
sed -i 's/box-shadow:/_box-shadow:/g' ~/.config/waybar/style.css
Attempting to wrap CSS properties in /* … */ comments with sed is fragile. Multi-line values and nested comments cause parse errors. Prefer the underscore-prefix technique or edit the file directly.
|
Journal & Kernel Log Queries
# Kernel messages filtered for thermal events
journalctl -k --since "1 hour ago" --no-pager | grep -iE 'thermal|throttl|trip'
# dmesg with ISO timestamps, filtered for hardware warnings
dmesg --time-format iso | tail -50 | grep -iE 'thermal|fan|throttl|error|warn'
Background Process Management
# Start waybar in background, detach from terminal
waybar &disown 2>/dev/null
# Chain: start waybar, wait 30s, then check CPU
waybar &disown 2>/dev/null && sleep 30 && top -b -n1 | awk '/waybar/ {print $1, $9, $10, $12}'
# Kill + restart + verify in one pipeline
kill $(pgrep -x waybar) 2>/dev/null && waybar &disown 2>/dev/null
Pattern: Isolation Testing
The most valuable technique from this incident — binary search by swapping components:
# Test 1: Minimal config (is waybar itself the problem?)
waybar -c <(echo '{"layer":"top","modules-center":["clock"],"clock":{"format":"%H:%M"}}') \
-s <(echo '* {font-size:14px;}') &disown 2>/dev/null
# Test 2: Full modules + minimal CSS (modules or CSS?)
waybar -s <(echo '* {font-size:14px; color:#cdd6f4; background:#1e1e2e;}') &disown 2>/dev/null
# Test 3: Full modules + full CSS minus one module (which module?)
sed -i '11s/"custom\/media", //' ~/.config/waybar/config
waybar &disown 2>/dev/null
Each test changes exactly one variable. 30 seconds of runtime with top -b -n1 gives a definitive CPU reading. This methodology identified the CSS as the sole culprit in 3 tests.
Related
-
INC: Razer Battery Drain — Previous thermal/power incident on same device (dGPU cause)
-
RCA: Kroki Orphan Containers — Pattern of orphaned processes causing heat
Metadata
| Field | Value |
|---|---|
Incident ID |
INC-2026-04-04-001 |
Author |
Evan Rosado |
Created |
2026-04-04 |
Last Updated |
2026-04-04 |
Status |
Resolved |
Post-Incident Review |
2026-04-04 (same session) |