Runbook: Disaster Recovery

Last Updated

2026-01-26

Owner

evanusmodestus

Frequency

As Needed / Annual Test


Purpose

Procedures for recovering critical infrastructure from backup after a disaster.

Recovery order matters. You cannot decrypt secrets without the age key. You cannot access systems without SSH keys. Follow the priority order exactly.

Recovery Priority Order

Priority Item Why Source

1

age master.key

Decrypts ALL .age files - without this, everything is unrecoverable

LUKS USB or M-Disc

2

SSH private keys

Access to all systems via SSH

LUKS USB

3

dsec secrets

Credentials for ISE, WLC, NAS, etc.

Git repo (encrypted)

4

YubiKey access

Hardware-bound authentication

Backup YubiKey

5

home-dc01 (AD/CA)

Root of trust for PKI

Windows Backup

6

certmgr-01

Certificate management

NAS backup

7

ISE

Network access control

NAS backup

8

Everything else

WLC, pfSense, Keycloak, etc.

NAS backup


Scenario: Lost Workstation (Complete Recovery)

This is the most common scenario - your workstation dies, gets stolen, or is compromised.

Step 1: Obtain Recovery Media

Locate your LUKS USB backup drive (home safe or offsite).

Step 2: Set Up New Workstation

Install your OS and essential packages:

# Arch Linux example
sudo pacman -S age git openssh gnupg

# Create directories
mkdir -p ~/.secrets/.metadata/keys
mkdir -p ~/.ssh

Step 3: Recover age Key (CRITICAL - DO THIS FIRST)

Without the age key, ALL encrypted secrets are permanently lost.
# Mount LUKS backup drive
sudo cryptsetup luksOpen /dev/sdX1 backup-usb
sudo mount /dev/mapper/backup-usb /mnt/backup

# Restore age key
cp /mnt/backup/keys/master.age.key ~/.secrets/.metadata/keys/
chmod 600 ~/.secrets/.metadata/keys/master.age.key

# Verify key works
cat ~/.secrets/.metadata/keys/master.age.key
# Should show: AGE-SECRET-KEY-1...

Step 4: Recover SSH Keys

# Restore SSH keys
cp /mnt/backup/ssh/id_* ~/.ssh/
chmod 700 ~/.ssh
chmod 600 ~/.ssh/id_*

# Test SSH access
ssh nas-01 "hostname && echo 'SSH working'"

Step 5: Clone Secrets Repository

# Clone encrypted secrets
git clone git@github.com:yourusername/secrets.git ~/.secrets

# Or from NAS
git clone ssh://nas-01/volume1/git/secrets.git ~/.secrets

# Test decryption
age -d -i ~/.secrets/.metadata/keys/master.age.key \
    ~/.secrets/environments/domains/d000/dev/network.env.age

Step 6: Install dsec and Verify

# Link dsec to path
ln -sf ~/.secrets/bin/dsec ~/.local/bin/dsec

# Test
dsec show d000 dev/network

Step 7: Unmount Backup Drive

sudo umount /mnt/backup
sudo cryptsetup luksClose backup-usb

Scenario: Lost YubiKey

If Primary Key Lost

  1. Use backup YubiKey immediately

  2. Revoke lost key from all systems

  3. Generate new resident key on replacement YubiKey

# Generate new key on replacement YubiKey
ssh-keygen -t ed25519-sk -O resident -O verify-required \
    -f ~/.ssh/id_ed25519_sk_rk_d000_new \
    -C "evanusmodestus@d000-yubikey-replacement"

# Deploy to all hosts
for host in nas-01 pfsense certmgr-01 ipsk-manager kvm-host; do
    ssh-copy-id -i ~/.ssh/id_ed25519_sk_rk_d000_new.pub $host
done

If Both Keys Lost

CRITICAL: This requires physical/console access to systems.

  1. Locate software fallback key on LUKS backup USB

  2. Use console access for systems without software key

  3. Reset credentials as needed

  4. Generate new YubiKey credentials

  5. Deploy to all systems


Scenario: Lost age Key (Worst Case)

If the age key is lost and no backup exists, ALL encrypted data is permanently unrecoverable.

If M-Disc Backup Exists

# Mount M-Disc
sudo mount /dev/sr0 /mnt/cdrom

# Recover age key
cp /mnt/cdrom/master.age.key ~/.secrets/.metadata/keys/
chmod 600 ~/.secrets/.metadata/keys/master.age.key

# Verify
age-keygen -y ~/.secrets/.metadata/keys/master.age.key

sudo umount /mnt/cdrom

If No Backup Exists

You must:

  1. Generate new age keypair

  2. Re-create ALL secrets from source services

  3. Re-encrypt everything

  4. Update all documentation

This is catastrophic. Always maintain multiple backups of the age key.


Scenario: Corrupted LUKS Header

If your LUKS-encrypted volume won’t open:

# Restore header from backup
sudo cryptsetup luksHeaderRestore /dev/nvme0n1p2 \
    --header-backup-file /mnt/backup/luks/workstation-header.img

# Try opening again
sudo cryptsetup luksOpen /dev/nvme0n1p2 encrypted-root

Infrastructure Recovery

home-dc01 (AD/CA) Recovery

Loss of the root CA compromises entire PKI. Air-gapped backup essential.

From Windows Backup

  1. Restore AD from Windows Server Backup

  2. Verify AD replication

  3. Verify Certificate Services

  4. Re-issue compromised certificates if needed

Full Rebuild (Worst Case)

  1. Rebuild AD from scratch

  2. Create new root CA

  3. Re-sign ALL intermediate CAs

  4. Re-issue ALL certificates

  5. Update all trust stores


ISE Recovery

# List available backups
netapi synology backup-list ise

# Download backup
netapi synology download /ise_backups/<backup-file> /tmp/

# Restore via ISE GUI
# Administration > System > Backup & Restore

KVM VMs Recovery

# List VM backups
netapi synology backup-list kvm

# Download XML from NAS
netapi synology download /kvm_backups/<vm-name>.xml /tmp/

# Define VM from XML
virsh define /tmp/<vm-name>.xml

# Start VM (assumes disk images intact)
virsh start <vm-name>

Keycloak Recovery

# List backups
netapi synology backup-list keycloak

# Download backup
netapi synology download /Backups/keycloak/<realm>.json /tmp/

# Import realm
netapi keycloak import-realm /tmp/<realm>.json

Network Config Recovery

pfSense

# Download backup from NAS
netapi synology download /firewall_backups/config-<date>.xml /tmp/

# Restore via pfSense GUI
# Diagnostics > Backup & Restore > Restore

WLC

# Download backup from NAS
netapi synology download /wlc_backups/config-<date>.tar /tmp/

# Restore via WLC GUI or CLI

IOS Switches

# Download config from NAS
netapi synology download /switch_backups/<switch>-config.txt /tmp/

# Apply via console
copy tftp://server/config startup-config
reload

Post-Recovery Checklist

  • age key decrypts test file

  • SSH access to all critical hosts

  • dsec secrets load correctly

  • YubiKey(s) working

  • AD replication healthy (if applicable)

  • Certificate chain validates

  • ISE authentication working

  • Network access functional

  • All services accessible

  • Backup jobs re-enabled and running

  • Update recovery documentation with lessons learned


Annual Recovery Drill

Perform a full recovery drill annually to verify:

  1. LUKS USB drives are readable

  2. M-Disc is readable

  3. All keys decrypt correctly

  4. Full recovery procedure works

  5. Documentation is accurate

Test Procedure

# Create test VM or use spare hardware
# Perform full recovery following this runbook
# Document any issues
# Update runbook as needed