INC-2026-03-10: Prevention

Prevention

Short-term

  • Runbook updated with proper SELinux fix procedure

  • Policy module installed on vault-01

Long-term

  • Deploy same policy module to vault-02/vault-03 when backup enabled

  • Document in standard VM provisioning checklist

  • Consider packaging policy module in Ansible role

Lessons Learned

What went well

  • SELinux audit logs (ausearch -m avc) provided clear diagnosis

  • Permissive domain approach captured all denials in one pass

  • Manual test confirmed issue was SELinux, not SSH keys

What could be improved

  • Initial fix attempt (single audit2allow) didn’t capture all permissions

  • Should have used permissive domain approach from the start

  • Backup monitoring should alert on failed systemd units

Key Takeaways

  1. Denial whack-a-mole: SELinux stops at first denial. Use permissive domain to capture ALL at once.

  2. Manual vs systemd: Manual root commands run unconfined_t, systemd runs confined domains.

  3. audit2allow -M: Creates loadable policy module from audit denials.

  4. Domain permissive vs system permissive: semanage permissive -a <domain> is surgical, setenforce 0 disables all SELinux.