INC-2026-03-10 vault-backup.service SELinux Failure

Incident Summary

Field Value

Detected

2026-03-10 ~02:21 UTC (timer run)

Resolved

2026-03-10 15:33 UTC

Duration

~13 hours (overnight, fixed in morning)

Severity

P3 (Medium) - Backups failing, no data loss

Impact

Automated Vault backups to NAS not running

Root Cause

SELinux rsync_t domain cannot execute ssh_exec_t by default

Timeline

Time (UTC) Event

02:21

vault-backup.timer triggered, service failed with exit code 14

14:41

Investigation started during worklog review

15:19

Root cause identified: SELinux AVC denial

15:22

First fix attempt (audit2allow) - partial, new denial appeared

15:27

Second denial (map) - regenerated policy

15:29

Third denial (search, read) - decided on proper approach

15:32

Set rsync_t to permissive, ran service, captured ALL denials

15:33

Installed complete policy module, tested in enforcing mode - SUCCESS

Symptoms

  • systemctl status vault-backup.service showed failed (Result: exit-code)

  • Exit code 14 (rsync IPC error)

  • Error message: rsync: [sender] Failed to exec ssh: Permission denied (13)

  • Manual execution as root worked fine

Investigation

Initial Triage

# Service status
systemctl status vault-backup.service
# Output: failed, exit code 14

# Logs
journalctl -u vault-backup.service --no-pager -n 50
# Output: "Failed to exec ssh: Permission denied (13)"

# Manual test (worked)
sudo bash -c 'tar -czf /tmp/test.tar.gz ... && rsync ...'
# Output: SUCCESS

SELinux Analysis

# Check for AVC denials
sudo ausearch -m avc --start today | grep rsync

# Output:
type=AVC msg=audit(...): avc: denied { execute_no_trans } for pid=3144
  comm="rsync" path="/usr/bin/ssh"
  scontext=system_u:system_r:rsync_t:s0
  tcontext=system_u:object_r:ssh_exec_t:s0

Findings

  1. Service runs as User=root in systemd

  2. Manual sudo runs in unconfined_t context - no SELinux restrictions

  3. systemd service runs rsync in rsync_t domain

  4. rsync_t cannot transition to ssh_exec_t by default

  5. Each denial only appears after previous one is fixed (whack-a-mole)

Root Cause

SELinux policy restriction: The rsync_t SELinux domain does not have permission to execute binaries labeled ssh_exec_t.

When rsync runs under systemd, it operates in the confined rsync_t domain. When it tries to spawn ssh for the remote transfer, SELinux blocks the execution.

Why manual worked: Running sudo bash -c '…​' from an interactive shell runs in unconfined_t, which has no restrictions.

Required permissions (5 total):

allow rsync_t ssh_exec_t:file { execute_no_trans map };
allow rsync_t ssh_home_t:dir search;
allow rsync_t ssh_home_t:file { getattr open read };
allow rsync_t systemd_conf_t:file { getattr open read };
allow rsync_t initrc_tmp_t:file open;

Resolution

Immediate Fix

Approach: Permissive domain to capture ALL denials, then comprehensive policy module.

# Step 1: Set domain to permissive (logs but allows)
sudo semanage permissive -a rsync_t

# Step 2: Run service to capture all denials
sudo systemctl start vault-backup.service
# SUCCESS (in permissive mode)

# Step 3: Generate comprehensive policy
sudo ausearch -m avc --start today | grep rsync | audit2allow -M vault-backup

# Step 4: Install policy module
sudo semodule -i vault-backup.pp

# Step 5: Remove permissive mode
sudo semanage permissive -d rsync_t

# Step 6: Test in enforcing mode
sudo systemctl start vault-backup.service
# SUCCESS

Verification

  • Service completed successfully (exit code 0)

  • Backup file transferred to NAS

  • Timer scheduled for next run (02:29 UTC)

  • SELinux in enforcing mode (getenforce = Enforcing)

  • No new AVC denials

Prevention

Short-term

  • Runbook updated with proper SELinux fix procedure

  • Policy module installed on vault-01

Long-term

  • Deploy same policy module to vault-02/vault-03 when backup enabled

  • Document in standard VM provisioning checklist

  • Consider packaging policy module in Ansible role

Lessons Learned

What went well

  • SELinux audit logs (ausearch -m avc) provided clear diagnosis

  • Permissive domain approach captured all denials in one pass

  • Manual test confirmed issue was SELinux, not SSH keys

What could be improved

  • Initial fix attempt (single audit2allow) didn’t capture all permissions

  • Should have used permissive domain approach from the start

  • Backup monitoring should alert on failed systemd units

Key Takeaways

  1. Denial whack-a-mole: SELinux stops at first denial. Use permissive domain to capture ALL at once.

  2. Manual vs systemd: Manual root commands run unconfined_t, systemd runs confined domains.

  3. audit2allow -M: Creates loadable policy module from audit denials.

  4. Domain permissive vs system permissive: semanage permissive -a <domain> is surgical, setenforce 0 disables all SELinux.