Runbook: Disaster Recovery
- Last Updated
-
2026-01-26
- Owner
-
evanusmodestus
- Frequency
-
As Needed / Annual Test
Purpose
Procedures for recovering critical infrastructure from backup after a disaster.
| Recovery order matters. You cannot decrypt secrets without the age key. You cannot access systems without SSH keys. Follow the priority order exactly. |
Recovery Priority Order
| Priority | Item | Why | Source |
|---|---|---|---|
1 |
age master.key |
Decrypts ALL |
LUKS USB or M-Disc |
2 |
SSH private keys |
Access to all systems via SSH |
LUKS USB |
3 |
dsec secrets |
Credentials for ISE, WLC, NAS, etc. |
Git repo (encrypted) |
4 |
YubiKey access |
Hardware-bound authentication |
Backup YubiKey |
5 |
home-dc01 (AD/CA) |
Root of trust for PKI |
Windows Backup |
6 |
certmgr-01 |
Certificate management |
NAS backup |
7 |
ISE |
Network access control |
NAS backup |
8 |
Everything else |
WLC, pfSense, Keycloak, etc. |
NAS backup |
Scenario: Lost Workstation (Complete Recovery)
This is the most common scenario - your workstation dies, gets stolen, or is compromised.
Step 2: Set Up New Workstation
Install your OS and essential packages:
# Arch Linux example
sudo pacman -S age git openssh gnupg
# Create directories
mkdir -p ~/.secrets/.metadata/keys
mkdir -p ~/.ssh
Step 3: Recover age Key (CRITICAL - DO THIS FIRST)
| Without the age key, ALL encrypted secrets are permanently lost. |
# Mount LUKS backup drive
sudo cryptsetup luksOpen /dev/sdX1 backup-usb
sudo mount /dev/mapper/backup-usb /mnt/backup
# Restore age key
cp /mnt/backup/keys/master.age.key ~/.secrets/.metadata/keys/
chmod 600 ~/.secrets/.metadata/keys/master.age.key
# Verify key works
cat ~/.secrets/.metadata/keys/master.age.key
# Should show: AGE-SECRET-KEY-1...
Step 4: Recover SSH Keys
# Restore SSH keys
cp /mnt/backup/ssh/id_* ~/.ssh/
chmod 700 ~/.ssh
chmod 600 ~/.ssh/id_*
# Test SSH access
ssh nas-01 "hostname && echo 'SSH working'"
Step 5: Clone Secrets Repository
# Clone encrypted secrets
git clone git@github.com:yourusername/secrets.git ~/.secrets
# Or from NAS
git clone ssh://nas-01/volume1/git/secrets.git ~/.secrets
# Test decryption
age -d -i ~/.secrets/.metadata/keys/master.age.key \
~/.secrets/environments/domains/d000/dev/network.env.age
Scenario: Lost YubiKey
If Primary Key Lost
-
Use backup YubiKey immediately
-
Revoke lost key from all systems
-
Generate new resident key on replacement YubiKey
# Generate new key on replacement YubiKey
ssh-keygen -t ed25519-sk -O resident -O verify-required \
-f ~/.ssh/id_ed25519_sk_rk_d000_new \
-C "evanusmodestus@d000-yubikey-replacement"
# Deploy to all hosts
for host in nas-01 pfsense certmgr-01 ipsk-manager kvm-host; do
ssh-copy-id -i ~/.ssh/id_ed25519_sk_rk_d000_new.pub $host
done
Scenario: Lost age Key (Worst Case)
| If the age key is lost and no backup exists, ALL encrypted data is permanently unrecoverable. |
Scenario: Corrupted LUKS Header
If your LUKS-encrypted volume won’t open:
# Restore header from backup
sudo cryptsetup luksHeaderRestore /dev/nvme0n1p2 \
--header-backup-file /mnt/backup/luks/workstation-header.img
# Try opening again
sudo cryptsetup luksOpen /dev/nvme0n1p2 encrypted-root
Infrastructure Recovery
home-dc01 (AD/CA) Recovery
| Loss of the root CA compromises entire PKI. Air-gapped backup essential. |
ISE Recovery
# List available backups
netapi synology backup-list ise
# Download backup
netapi synology download /ise_backups/<backup-file> /tmp/
# Restore via ISE GUI
# Administration > System > Backup & Restore
KVM VMs Recovery
# List VM backups
netapi synology backup-list kvm
# Download XML from NAS
netapi synology download /kvm_backups/<vm-name>.xml /tmp/
# Define VM from XML
virsh define /tmp/<vm-name>.xml
# Start VM (assumes disk images intact)
virsh start <vm-name>
Keycloak Recovery
# List backups
netapi synology backup-list keycloak
# Download backup
netapi synology download /Backups/keycloak/<realm>.json /tmp/
# Import realm
netapi keycloak import-realm /tmp/<realm>.json
Network Config Recovery
pfSense
# Download backup from NAS
netapi synology download /firewall_backups/config-<date>.xml /tmp/
# Restore via pfSense GUI
# Diagnostics > Backup & Restore > Restore
Post-Recovery Checklist
-
age key decrypts test file
-
SSH access to all critical hosts
-
dsec secrets load correctly
-
YubiKey(s) working
-
AD replication healthy (if applicable)
-
Certificate chain validates
-
ISE authentication working
-
Network access functional
-
All services accessible
-
Backup jobs re-enabled and running
-
Update recovery documentation with lessons learned
Annual Recovery Drill
Perform a full recovery drill annually to verify:
-
LUKS USB drives are readable
-
M-Disc is readable
-
All keys decrypt correctly
-
Full recovery procedure works
-
Documentation is accurate