INC-2026-03-10 vault-backup.service SELinux Failure
Incident Summary
| Field | Value |
|---|---|
Detected |
2026-03-10 ~02:21 UTC (timer run) |
Resolved |
2026-03-10 15:33 UTC |
Duration |
~13 hours (overnight, fixed in morning) |
Severity |
P3 (Medium) - Backups failing, no data loss |
Impact |
Automated Vault backups to NAS not running |
Root Cause |
SELinux |
Timeline
| Time (UTC) | Event |
|---|---|
02:21 |
vault-backup.timer triggered, service failed with exit code 14 |
14:41 |
Investigation started during worklog review |
15:19 |
Root cause identified: SELinux AVC denial |
15:22 |
First fix attempt (audit2allow) - partial, new denial appeared |
15:27 |
Second denial ( |
15:29 |
Third denial ( |
15:32 |
Set rsync_t to permissive, ran service, captured ALL denials |
15:33 |
Installed complete policy module, tested in enforcing mode - SUCCESS |
Symptoms
-
systemctl status vault-backup.serviceshowedfailed (Result: exit-code) -
Exit code 14 (rsync IPC error)
-
Error message:
rsync: [sender] Failed to exec ssh: Permission denied (13) -
Manual execution as root worked fine
Investigation
Initial Triage
# Service status
systemctl status vault-backup.service
# Output: failed, exit code 14
# Logs
journalctl -u vault-backup.service --no-pager -n 50
# Output: "Failed to exec ssh: Permission denied (13)"
# Manual test (worked)
sudo bash -c 'tar -czf /tmp/test.tar.gz ... && rsync ...'
# Output: SUCCESS
SELinux Analysis
# Check for AVC denials
sudo ausearch -m avc --start today | grep rsync
# Output:
type=AVC msg=audit(...): avc: denied { execute_no_trans } for pid=3144
comm="rsync" path="/usr/bin/ssh"
scontext=system_u:system_r:rsync_t:s0
tcontext=system_u:object_r:ssh_exec_t:s0
Findings
-
Service runs as
User=rootin systemd -
Manual
sudoruns inunconfined_tcontext - no SELinux restrictions -
systemd service runs rsync in
rsync_tdomain -
rsync_tcannot transition tossh_exec_tby default -
Each denial only appears after previous one is fixed (whack-a-mole)
Root Cause
SELinux policy restriction: The rsync_t SELinux domain does not have permission to execute binaries labeled ssh_exec_t.
When rsync runs under systemd, it operates in the confined rsync_t domain. When it tries to spawn ssh for the remote transfer, SELinux blocks the execution.
Why manual worked: Running sudo bash -c '…' from an interactive shell runs in unconfined_t, which has no restrictions.
Required permissions (5 total):
allow rsync_t ssh_exec_t:file { execute_no_trans map };
allow rsync_t ssh_home_t:dir search;
allow rsync_t ssh_home_t:file { getattr open read };
allow rsync_t systemd_conf_t:file { getattr open read };
allow rsync_t initrc_tmp_t:file open;
Resolution
Immediate Fix
Approach: Permissive domain to capture ALL denials, then comprehensive policy module.
# Step 1: Set domain to permissive (logs but allows)
sudo semanage permissive -a rsync_t
# Step 2: Run service to capture all denials
sudo systemctl start vault-backup.service
# SUCCESS (in permissive mode)
# Step 3: Generate comprehensive policy
sudo ausearch -m avc --start today | grep rsync | audit2allow -M vault-backup
# Step 4: Install policy module
sudo semodule -i vault-backup.pp
# Step 5: Remove permissive mode
sudo semanage permissive -d rsync_t
# Step 6: Test in enforcing mode
sudo systemctl start vault-backup.service
# SUCCESS
Verification
-
Service completed successfully (exit code 0)
-
Backup file transferred to NAS
-
Timer scheduled for next run (02:29 UTC)
-
SELinux in enforcing mode (
getenforce= Enforcing) -
No new AVC denials
Prevention
Short-term
-
Runbook updated with proper SELinux fix procedure
-
Policy module installed on vault-01
Long-term
-
Deploy same policy module to vault-02/vault-03 when backup enabled
-
Document in standard VM provisioning checklist
-
Consider packaging policy module in Ansible role
Lessons Learned
What went well
-
SELinux audit logs (
ausearch -m avc) provided clear diagnosis -
Permissive domain approach captured all denials in one pass
-
Manual test confirmed issue was SELinux, not SSH keys
What could be improved
-
Initial fix attempt (single audit2allow) didn’t capture all permissions
-
Should have used permissive domain approach from the start
-
Backup monitoring should alert on failed systemd units
Key Takeaways
-
Denial whack-a-mole: SELinux stops at first denial. Use permissive domain to capture ALL at once.
-
Manual vs systemd: Manual root commands run
unconfined_t, systemd runs confined domains. -
audit2allow -M: Creates loadable policy module from audit denials. -
Domain permissive vs system permissive:
semanage permissive -a <domain>is surgical,setenforce 0disables all SELinux.