Vault HA Cluster Deployment

Deploy a 3-node Vault Raft cluster with vault-01 on kvm-01 and vault-02/vault-03 on kvm-02 for true HA across hypervisors.

Prerequisites

Complete these before starting:

  • Phase 0: NAS NFS access for kvm-02

  • kvm-02 operational (22/22 health checks)

  • DNS records for vault-02 (10.50.1.61) and vault-03 (10.50.1.62)

  • Firewall rules: 8200/tcp (API), 8201/tcp (Raft cluster)

Create DNS Records (bind-01)

Validate existing DNS records first:

ssh bind-01 "sudo grep -n 'vault-0' /var/named/inside.domusdigitalis.dev.zone"

Atomic DNS record addition (auto-increments serial, triggers AXFR to bind-02):

# vault-02 (if missing)
ssh bind-01 "sudo bash -c '
  ZONE=/var/named/inside.domusdigitalis.dev.zone
  SERIAL=\$(date +%Y%m%d%H)
  sed -i \"s/[0-9]\\{10\\}.*Serial/\$SERIAL  ; Serial/\" \$ZONE
  echo \"vault-02      IN  A       10.50.1.61\" >> \$ZONE
  rndc reload inside.domusdigitalis.dev
'"
# vault-03 (if missing)
ssh bind-01 "sudo bash -c '
  ZONE=/var/named/inside.domusdigitalis.dev.zone
  SERIAL=\$(date +%Y%m%d%H)
  sed -i \"s/[0-9]\\{10\\}.*Serial/\$SERIAL  ; Serial/\" \$ZONE
  echo \"vault-03      IN  A       10.50.1.62\" >> \$ZONE
  rndc reload inside.domusdigitalis.dev
'"

Verify resolution on both DNS servers:

dig +short vault-02.inside.domusdigitalis.dev @10.50.1.90
dig +short vault-02.inside.domusdigitalis.dev @10.50.1.91
dig +short vault-03.inside.domusdigitalis.dev @10.50.1.90
dig +short vault-03.inside.domusdigitalis.dev @10.50.1.91

The atomic pattern auto-increments serial using date +%Y%m%d%H, which triggers AXFR to bind-02 automatically via NOTIFY.

If bind-02 still not syncing, force transfer:

ssh bind-02 "sudo rndc retransfer inside.domusdigitalis.dev"

Cluster Architecture

Node IP Hypervisor Status

vault-01

10.50.1.60

kvm-01

Active (Leader)

vault-02

10.50.1.61

kvm-02

Planned (Raft follower)

vault-03

10.50.1.62

kvm-02

Planned (Raft follower)

kvm-01                          kvm-02
┌─────────────────┐    ┌─────────────────────────────┐
│  vault-01       │    │  vault-02      vault-03    │
│  10.50.1.60     │◄──►│  10.50.1.61    10.50.1.62  │
│  LEADER         │    │  FOLLOWER      FOLLOWER    │
│  (file→raft)    │    │  (new deploy)  (new deploy)│
└─────────────────┘    └─────────────────────────────┘
         │                       │
         └───────────────────────┘
              Raft Consensus (port 8201)

Phase 1: Migrate vault-01 from File to Raft Storage ✅ COMPLETE (2026-03-09)

vault-01 migrated from storage "file" to storage "raft" on 2026-03-09. Backup: /tmp/vault-file-backup-20260310.tar.gz on vault-01.

Phase 1 Progress Checklist
  • 1.1 Verify current storage backend (storage "file")

  • 1.1.1 Set VAULT_ADDR and verify vault status

  • 1.2 Stop Vault and create backup

  • 1.3 Create migration config (with cluster_addr!)

  • 1.4 Create raft directory and run migration

  • 1.5 Update vault.hcl for Raft

  • 1.6 Start Vault and unseal

  • 1.7 Verify migration (Storage Type: raft, HA Enabled: true)

  • 1.8 Cleanup migrate.hcl

1.1 Verify Current Storage Backend

ssh vault-01 "grep -A3 'storage' /etc/vault.d/vault.hcl"

If output shows storage "file", proceed with migration. If storage "raft", skip to Phase 2.

1.1.1 Set VAULT_ADDR on vault-01

Vault CLI requires VAULT_ADDR to be set. Without it, commands fail with "connection refused".

ssh vault-01
export VAULT_ADDR="https://vault-01.inside.domusdigitalis.dev:8200"

Verify Vault is running:

vault status

Expected: Sealed: false, Storage Type: file, HA Enabled: false

1.2 Stop Vault and Backup

ssh vault-01 "sudo systemctl stop vault"
ssh vault-01 "sudo tar -czvf /tmp/vault-file-backup-$(date +%Y%m%d).tar.gz /opt/vault/data"
# Copy to NAS for belt-and-suspenders backup
ssh vault-01 "sudo cp /tmp/vault-file-backup-*.tar.gz /mnt/nas/backups/vault/"

1.3 Create Migration Configuration

The migration config MUST include cluster_addr or migration will fail with: error mounting 'storage_destination': cluster_addr config not set

ssh vault-01 "sudo tee /etc/vault.d/migrate.hcl << 'EOF'
storage_source \"file\" {
  path = \"/opt/vault/data\"
}

storage_destination \"raft\" {
  path    = \"/opt/vault/raft\"
  node_id = \"vault-01\"
}

cluster_addr = \"https://vault-01.inside.domusdigitalis.dev:8201\"
EOF"

1.4 Prepare Raft Directory and Run Migration

ssh vault-01 "sudo mkdir -p /opt/vault/raft && sudo chown vault:vault /opt/vault/raft"
ssh vault-01 "sudo -u vault vault operator migrate -config=/etc/vault.d/migrate.hcl"

Expected output:

2026/03/01 14:30:00 [INFO] copied key: core/...
2026/03/01 14:30:00 [INFO] copied key: logical/...
...
Success! All data migrated.

1.5 Update vault.hcl for Raft

ssh vault-01 "sudo cp /etc/vault.d/vault.hcl /etc/vault.d/vault.hcl.file-backup"

vault-01 has ca-chain.crt (from original setup), vault-02/03 have ca.crt (from Phase 4). Both work - they’re the same DOMUS-ISSUING-CA.

ssh vault-01 "sudo tee /etc/vault.d/vault.hcl << 'EOF'
# Vault Configuration - Raft HA Cluster
# Migrated from file storage on $(date +%Y-%m-%d)

ui = true
disable_mlock = true

# Raft Integrated Storage (HA-ready)
storage \"raft\" {
  path    = \"/opt/vault/raft\"
  node_id = \"vault-01\"

  retry_join {
    leader_api_addr     = \"https://vault-02.inside.domusdigitalis.dev:8200\"
    leader_ca_cert_file = \"/opt/vault/tls/ca-chain.crt\"
  }
  retry_join {
    leader_api_addr     = \"https://vault-03.inside.domusdigitalis.dev:8200\"
    leader_ca_cert_file = \"/opt/vault/tls/ca-chain.crt\"
  }
}

# HTTPS listener
listener \"tcp\" {
  address       = \"0.0.0.0:8200\"
  tls_cert_file = \"/opt/vault/tls/vault.crt\"
  tls_key_file  = \"/opt/vault/tls/vault.key\"
}

# Cluster communication
cluster_addr = \"https://vault-01.inside.domusdigitalis.dev:8201\"
api_addr     = \"https://vault-01.inside.domusdigitalis.dev:8200\"
EOF"

1.6 Start Vault and Unseal

ssh vault-01 "sudo systemctl start vault"
ssh vault-01 "vault status"
# Unseal (retrieve keys from dsec: dsec show d000/dev/vault)
ssh vault-01 "vault operator unseal"  # Enter key 1
ssh vault-01 "vault operator unseal"  # Enter key 2

1.7 Verify Migration

1.7.1 Check Raft Peers

vault operator raft list-peers

Expected output:

Node        Address                                    State     Voter
----        -------                                    -----     -----
vault-01    vault-01.inside.domusdigitalis.dev:8201    leader    true

1.7.2 Check Vault Status

vault status

Expected output:

Storage Type            raft
HA Enabled              true
HA Cluster              https://vault-01.inside.domusdigitalis.dev:8201
HA Mode                 active

1.7.3 Verify Data Integrity

vault list pki_int/certs | head -5
vault read ssh/config/ca

Expected: PKI certificates listed, SSH CA public key visible

1.8 Cleanup

ssh vault-01 "sudo rm /etc/vault.d/migrate.hcl"

Phase 2: Deploy vault-02 on kvm-02

2.1 Create VM Disk

Copy base image and resize (do NOT use qemu-img create which makes empty disk):

ssh kvm-02 "sudo cp /var/lib/libvirt/images/Rocky-9-GenericCloud.qcow2 /var/lib/libvirt/images/vault-02.qcow2"
ssh kvm-02 "sudo qemu-img resize /var/lib/libvirt/images/vault-02.qcow2 20G"
PRE validation
ssh kvm-02 "qemu-img info /var/lib/libvirt/images/vault-02.qcow2 | grep -E 'virtual|actual'"

Expected: virtual size: 20 GiB

2.2 Create Cloud-Init Configuration

ssh kvm-02 "cat > /tmp/vault-02-cloud-init.yml << 'EOF'
#cloud-config
hostname: vault-02
fqdn: vault-02.inside.domusdigitalis.dev
manage_etc_hosts: true

users:
  - name: evanusmodestus
    groups: wheel
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    lock_passwd: false
    plain_text_passwd: changeme123
    ssh_authorized_keys:
      # Vault SSH CA signed key (8h TTL)
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIrgE9z8gkQVRVkkdbc1ejdth7vJkqpY35FrIUv8L6JB vault-signed
      # YubiKey nano
      - sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5QG9wZW5zc2guY29tAAAAIG/EGu00HuV3jnisul7DUBuk9jLtrE3yR4BZCwGb2YpCAAAABHNzaDo= d000-nano-35641207
      # YubiKey primary
      - sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5QG9wZW5zc2guY29tAAAAIFHfsGSAFAkqwYj6EGS9sA2MROjs28zM6LJds3gagsCkAAAACHNzaDpkMDAw evanusmodestus@d000-yubikey
      # YubiKey secondary
      - sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5QG9wZW5zc2guY29tAAAAIEBZ+kus4aTHzQt1zNnEnGxJs+Lf56vrCdcyvqLhpp9hAAAACHNzaDpkMDAw ssh:d000
      # Fallback ed25519
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL3vaIABqHOwy88p/5GcX3ZNU044GAz/3T5dH8GIU7DS evanusmodestus@d000

write_files:
  - path: /etc/sysctl.d/99-vault.conf
    content: |
      vm.swappiness = 1

runcmd:
  # Configure static IP
  - nmcli connection delete 'Wired connection 1' 2>/dev/null || true
  - nmcli con add type ethernet ifname eth0 con-name mgmt ipv4.addresses 10.50.1.61/24 ipv4.gateway 10.50.1.1 ipv4.dns 10.50.1.90 ipv4.method manual
  - nmcli con up mgmt
  # Install Vault
  - dnf install -y dnf-plugins-core
  - dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
  - dnf install -y vault firewalld
  - mkdir -p /opt/vault/{raft,tls}
  - chown -R vault:vault /opt/vault
  - sysctl -p /etc/sysctl.d/99-vault.conf
  # Firewall
  - systemctl enable --now firewalld
  - firewall-cmd --permanent --add-port=8200/tcp
  - firewall-cmd --permanent --add-port=8201/tcp
  - firewall-cmd --reload
EOF"

2.3 Create Cloud-Init ISO

Create meta-data file:

ssh kvm-02 "cat > /tmp/vault-02-meta-data << 'EOF'
instance-id: vault-02
local-hostname: vault-02
EOF"

Cloud-init NoCloud datasource requires files named EXACTLY user-data and meta-data.

Rename files for cloud-init:

ssh kvm-02 "cp /tmp/vault-02-cloud-init.yml /tmp/user-data && cp /tmp/vault-02-meta-data /tmp/meta-data"

Create ISO:

ssh kvm-02 "sudo genisoimage -output /var/lib/libvirt/images/vault-02-cidata.iso \
  -volid cidata -joliet -rock \
  /tmp/user-data /tmp/meta-data"
POST validation
ssh kvm-02 "isoinfo -i /var/lib/libvirt/images/vault-02-cidata.iso -l | grep -E 'USER|META'"

Expected: USER_DATA and META_DATA visible

2.4 Download Rocky Linux Cloud Image (if not cached)

ssh kvm-02 "ls /var/lib/libvirt/images/Rocky-9-GenericCloud*.qcow2 2>/dev/null || \
  sudo curl -Lo /var/lib/libvirt/images/Rocky-9-GenericCloud.qcow2 \
  https://download.rockylinux.org/pub/rocky/9/images/x86_64/Rocky-9-GenericCloud.latest.x86_64.qcow2"

2.5 Create VM with virt-install

Rocky cloud images do NOT have serial console enabled. Use VNC graphics.

ssh kvm-02 "sudo virt-install \
  --name vault-02 \
  --memory 2048 \
  --vcpus 2 \
  --disk path=/var/lib/libvirt/images/vault-02.qcow2 \
  --disk path=/var/lib/libvirt/images/vault-02-cidata.iso,device=cdrom \
  --os-variant rocky9 \
  --network bridge=br-mgmt \
  --graphics vnc,listen=0.0.0.0 \
  --import \
  --noautoconsole"
POST validation
ssh kvm-02 "sudo virsh list | grep vault-02"

Expected: vault-02 running

2.5.1 Add VLANs to Bridge Interface

kvm-02 uses bridge VLAN filtering. New VM vnet interfaces default to VLAN 1 only. Must add all management VLANs to match other working VMs.

ssh kvm-02 "VNET=\$(sudo virsh domiflist vault-02 | awk '/br-mgmt/{print \$1}') && \
for vid in 10 20 30 40 110 120; do sudo bridge vlan add vid \$vid dev \$VNET; done && \
sudo bridge vlan add vid 100 dev \$VNET pvid untagged && \
sudo bridge vlan del vid 1 dev \$VNET"
POST validation
ssh kvm-02 "VNET=\$(sudo virsh domiflist vault-02 | awk '/br-mgmt/{print \$1}') && bridge vlan show dev \$VNET"

Expected: VLANs 10, 20, 30, 40, 100 (PVID), 110, 120 - NO VLAN 1

2.6 Wait for Cloud-Init and Verify

Cloud-init configures static IP, installs Vault, and enables firewall automatically.

Wait ~2-3 minutes for cloud-init to complete, then verify:

# Wait for VM to get static IP
sleep 120
ping -c2 10.50.1.61

2.7 Verify SSH Access

PRE validation (from workstation)
ping -c2 10.50.1.61
ssh vault-02 "hostname && ip -4 -o addr show eth0 | awk '{print \$4}'"

Expected: vault-02 and 10.50.1.61/24


Phase 3: Deploy vault-03 on kvm-02

3.1 Create VM Disk

ssh kvm-02 "sudo cp /var/lib/libvirt/images/Rocky-9-GenericCloud.qcow2 /var/lib/libvirt/images/vault-03.qcow2"
ssh kvm-02 "sudo qemu-img resize /var/lib/libvirt/images/vault-03.qcow2 20G"

3.2 Create Cloud-Init Configuration

ssh kvm-02 "cat > /tmp/vault-03-cloud-init.yml << 'EOF'
#cloud-config
hostname: vault-03
fqdn: vault-03.inside.domusdigitalis.dev
manage_etc_hosts: true

users:
  - name: evanusmodestus
    groups: wheel
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    lock_passwd: false
    plain_text_passwd: changeme123
    ssh_authorized_keys:
      # Vault SSH CA signed key (8h TTL)
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIrgE9z8gkQVRVkkdbc1ejdth7vJkqpY35FrIUv8L6JB vault-signed
      # YubiKey nano
      - sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5QG9wZW5zc2guY29tAAAAIG/EGu00HuV3jnisul7DUBuk9jLtrE3yR4BZCwGb2YpCAAAABHNzaDo= d000-nano-35641207
      # YubiKey primary
      - sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5QG9wZW5zc2guY29tAAAAIFHfsGSAFAkqwYj6EGS9sA2MROjs28zM6LJds3gagsCkAAAACHNzaDpkMDAw evanusmodestus@d000-yubikey
      # YubiKey secondary
      - sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5QG9wZW5zc2guY29tAAAAIEBZ+kus4aTHzQt1zNnEnGxJs+Lf56vrCdcyvqLhpp9hAAAACHNzaDpkMDAw ssh:d000
      # Fallback ed25519
      - ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL3vaIABqHOwy88p/5GcX3ZNU044GAz/3T5dH8GIU7DS evanusmodestus@d000

write_files:
  - path: /etc/sysctl.d/99-vault.conf
    content: |
      vm.swappiness = 1

runcmd:
  # Configure static IP
  - nmcli connection delete 'Wired connection 1' 2>/dev/null || true
  - nmcli con add type ethernet ifname eth0 con-name mgmt ipv4.addresses 10.50.1.62/24 ipv4.gateway 10.50.1.1 ipv4.dns 10.50.1.90 ipv4.method manual
  - nmcli con up mgmt
  # Install Vault
  - dnf install -y dnf-plugins-core
  - dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
  - dnf install -y vault firewalld
  - mkdir -p /opt/vault/{raft,tls}
  - chown -R vault:vault /opt/vault
  - sysctl -p /etc/sysctl.d/99-vault.conf
  # Firewall
  - systemctl enable --now firewalld
  - firewall-cmd --permanent --add-port=8200/tcp
  - firewall-cmd --permanent --add-port=8201/tcp
  - firewall-cmd --reload
EOF"

3.3 Create Cloud-Init ISO

ssh kvm-02 "cat > /tmp/vault-03-meta-data << 'EOF'
instance-id: vault-03
local-hostname: vault-03
EOF"
ssh kvm-02 "cp /tmp/vault-03-cloud-init.yml /tmp/user-data && cp /tmp/vault-03-meta-data /tmp/meta-data"
ssh kvm-02 "sudo genisoimage -output /var/lib/libvirt/images/vault-03-cidata.iso \
  -volid cidata -joliet -rock \
  /tmp/user-data /tmp/meta-data"

3.4 Create VM

ssh kvm-02 "sudo virt-install \
  --name vault-03 \
  --memory 2048 \
  --vcpus 2 \
  --disk path=/var/lib/libvirt/images/vault-03.qcow2 \
  --disk path=/var/lib/libvirt/images/vault-03-cidata.iso,device=cdrom \
  --os-variant rocky9 \
  --network bridge=br-mgmt \
  --graphics vnc,listen=0.0.0.0 \
  --import \
  --noautoconsole"

3.4.1 Add VLANs to Bridge Interface

ssh kvm-02 "VNET=\$(sudo virsh domiflist vault-03 | awk '/br-mgmt/{print \$1}') && \
for vid in 10 20 30 40 110 120; do sudo bridge vlan add vid \$vid dev \$VNET; done && \
sudo bridge vlan add vid 100 dev \$VNET pvid untagged && \
sudo bridge vlan del vid 1 dev \$VNET"
POST validation
ssh kvm-02 "VNET=\$(sudo virsh domiflist vault-03 | awk '/br-mgmt/{print \$1}') && bridge vlan show dev \$VNET"

Expected: VLANs 10, 20, 30, 40, 100 (PVID), 110, 120 - NO VLAN 1

3.5 Wait for Cloud-Init and Verify

Cloud-init configures static IP, installs Vault, and enables firewall automatically.

Wait ~2-3 minutes for cloud-init to complete, then verify:

sleep 120
ping -c2 10.50.1.62

3.6 Verify SSH Access

ssh vault-03 "hostname && ip -4 -o addr show eth0 | awk '{print \$4}'"

Expected: vault-03 and 10.50.1.62/24


Phase 4: Issue TLS Certificates

tee pattern: Using | tee file shows output AND saves it - no silent failures.

4.0 Configure PKI Role for Vault HA Certificates

Enterprise-grade certificates require SANs for all access patterns:

  • FQDN (CN): vault-02.inside.domusdigitalis.dev - DNS resolution

  • Short hostname (alt_names): vault-02 - internal cluster communication

  • IP address (ip_sans): 10.50.1.61 - direct IP access

4.0.1 Check Current Role Configuration

vault read pki_int/roles/domus-server | grep -E "allow_bare|allowed_domains|allow_sub"

4.0.2 Update Role with Vault Hostnames

The allowed_domains list must include short hostnames explicitly. Setting allow_bare_domains=true alone is NOT sufficient - it only allows issuing for the domains themselves (e.g., inside.domusdigitalis.dev), not arbitrary short names.

vault write pki_int/roles/domus-server \
  allowed_domains="inside.domusdigitalis.dev,domusdigitalis.dev,vault-01,vault-02,vault-03" \
  allow_bare_domains=true \
  allow_subdomains=true

4.0.3 Verify Role Configuration

vault read pki_int/roles/domus-server | grep -E "allow_bare|allowed_domains"

Expected output:

allow_bare_domains    true
allowed_domains       [inside.domusdigitalis.dev domusdigitalis.dev vault-01 vault-02 vault-03]

Why add hostnames to allowed_domains?

  • allow_subdomains=true → allows *.inside.domusdigitalis.dev

  • allow_bare_domains=true → allows inside.domusdigitalis.dev itself

  • Neither allows vault-02 as a SAN - it must be in allowed_domains

Without this, cert issuance fails: subject alternate name vault-02 not allowed by this role

4.1 Issue Certs from Vault PKI (vault-02)

Issue certificate with enterprise-grade SANs:

  • CN (FQDN): DNS resolution, primary access

  • alt_names (short): Local hostname resolution via search domain

  • ip_sans: Direct IP access, cluster communication

vault write -format=json pki_int/issue/domus-server \
  common_name="vault-02.inside.domusdigitalis.dev" \
  alt_names="vault-02" \
  ip_sans="10.50.1.61" \
  ttl="8760h" | tee /tmp/vault-02-cert.json

Extract cert, key, CA (tee shows each file content):

jq -r '.data.certificate' /tmp/vault-02-cert.json | tee /tmp/vault-02.crt
jq -r '.data.private_key' /tmp/vault-02-cert.json | tee /tmp/vault-02.key
jq -r '.data.issuing_ca' /tmp/vault-02-cert.json | tee /tmp/vault-ca.crt

4.2 Issue Certs from Vault PKI (vault-03)

vault write -format=json pki_int/issue/domus-server \
  common_name="vault-03.inside.domusdigitalis.dev" \
  alt_names="vault-03" \
  ip_sans="10.50.1.62" \
  ttl="8760h" | tee /tmp/vault-03-cert.json
jq -r '.data.certificate' /tmp/vault-03-cert.json | tee /tmp/vault-03.crt
jq -r '.data.private_key' /tmp/vault-03-cert.json | tee /tmp/vault-03.key

4.3 Verify Certs Issued

openssl x509 -in /tmp/vault-02.crt -noout -subject -dates
openssl x509 -in /tmp/vault-03.crt -noout -subject -dates

Expected: Subject shows correct hostname, expiry ~1 year out

4.4 Deploy Certs to vault-02

4.4.1 Verify Vault is Installed

ssh vault-02 "which vault && id vault"

If Vault not installed:

ssh vault-02 "sudo dnf install -y dnf-plugins-core && \
  sudo dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo && \
  sudo dnf install -y vault && \
  sudo mkdir -p /opt/vault/{raft,tls} && \
  sudo chown -R vault:vault /opt/vault"

4.4.2 Copy and Deploy Certs

scp /tmp/vault-02.crt /tmp/vault-02.key /tmp/vault-ca.crt vault-02:/tmp/
ssh vault-02 "sudo mv /tmp/vault-02.crt /opt/vault/tls/vault.crt && \
  sudo mv /tmp/vault-02.key /opt/vault/tls/vault.key && \
  sudo mv /tmp/vault-ca.crt /opt/vault/tls/ca.crt"

Do NOT use glob with chown over SSH - sudo chown vault:vault /opt/vault/tls/* fails because the glob expands BEFORE files exist. Use explicit paths:

ssh vault-02 "sudo chown vault:vault /opt/vault/tls/vault.crt /opt/vault/tls/vault.key /opt/vault/tls/ca.crt && \
  sudo chmod 600 /opt/vault/tls/vault.key && \
  sudo ls -la /opt/vault/tls/"

4.5 Deploy Certs to vault-03

4.5.1 Verify Vault is Installed

ssh vault-03 "which vault && id vault"

If Vault not installed, run the same dnf commands as vault-02.

4.5.2 Copy and Deploy Certs

scp /tmp/vault-03.crt /tmp/vault-03.key /tmp/vault-ca.crt vault-03:/tmp/
ssh vault-03 "sudo mv /tmp/vault-03.crt /opt/vault/tls/vault.crt && \
  sudo mv /tmp/vault-03.key /opt/vault/tls/vault.key && \
  sudo mv /tmp/vault-ca.crt /opt/vault/tls/ca.crt && \
  sudo chown vault:vault /opt/vault/tls/vault.crt /opt/vault/tls/vault.key /opt/vault/tls/ca.crt && \
  sudo chmod 600 /opt/vault/tls/vault.key && \
  sudo ls -la /opt/vault/tls/"

Expected: 3 files with vault:vault ownership, key with 600 permissions


Phase 5: Configure and Join Cluster

leader_ca_cert_file is REQUIRED - Without it, nodes cannot verify each other’s TLS certificates and you get failed to get raft challenge errors.

5.1 Configure vault-02

ssh vault-02 "sudo tee /etc/vault.d/vault.hcl << 'EOF'
ui = true
disable_mlock = true

storage \"raft\" {
  path    = \"/opt/vault/raft\"
  node_id = \"vault-02\"

  retry_join {
    leader_api_addr     = \"https://vault-01.inside.domusdigitalis.dev:8200\"
    leader_ca_cert_file = \"/opt/vault/tls/ca.crt\"
  }
  retry_join {
    leader_api_addr     = \"https://vault-03.inside.domusdigitalis.dev:8200\"
    leader_ca_cert_file = \"/opt/vault/tls/ca.crt\"
  }
}

listener \"tcp\" {
  address       = \"0.0.0.0:8200\"
  tls_cert_file = \"/opt/vault/tls/vault.crt\"
  tls_key_file  = \"/opt/vault/tls/vault.key\"
}

cluster_addr = \"https://vault-02.inside.domusdigitalis.dev:8201\"
api_addr     = \"https://vault-02.inside.domusdigitalis.dev:8200\"
EOF"

5.2 Configure vault-03

ssh vault-03 "sudo tee /etc/vault.d/vault.hcl << 'EOF'
ui = true
disable_mlock = true

storage \"raft\" {
  path    = \"/opt/vault/raft\"
  node_id = \"vault-03\"

  retry_join {
    leader_api_addr     = \"https://vault-01.inside.domusdigitalis.dev:8200\"
    leader_ca_cert_file = \"/opt/vault/tls/ca.crt\"
  }
  retry_join {
    leader_api_addr     = \"https://vault-02.inside.domusdigitalis.dev:8200\"
    leader_ca_cert_file = \"/opt/vault/tls/ca.crt\"
  }
}

listener \"tcp\" {
  address       = \"0.0.0.0:8200\"
  tls_cert_file = \"/opt/vault/tls/vault.crt\"
  tls_key_file  = \"/opt/vault/tls/vault.key\"
}

cluster_addr = \"https://vault-03.inside.domusdigitalis.dev:8201\"
api_addr     = \"https://vault-03.inside.domusdigitalis.dev:8200\"
EOF"

5.3 Start and Join vault-02

ssh vault-02 "sudo systemctl enable --now vault"
PRE validation (check service started)
ssh vault-02 "systemctl is-active vault"

Expected: active

ssh vault-02 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault operator raft join https://vault-01.inside.domusdigitalis.dev:8200"

Unseal keys are the SAME as vault-01. Retrieve from dsec show d000/dev/vault.

Use -t for TTY - Without TTY allocation, the unseal prompt fails with "file descriptor 0 is not a terminal".

ssh -t vault-02 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault operator unseal"

Run unseal twice (threshold=2).

POST validation
ssh vault-02 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault status | grep -E 'Sealed|HA'"

Expected: Sealed: false, HA Mode: standby

5.4 Start and Join vault-03

ssh vault-03 "sudo systemctl enable --now vault"
PRE validation
ssh vault-03 "systemctl is-active vault"
ssh vault-03 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault operator raft join https://vault-01.inside.domusdigitalis.dev:8200"
ssh -t vault-03 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault operator unseal"

Run unseal twice (threshold=2).

POST validation
ssh vault-03 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault status | grep -E 'Sealed|HA'"

Expected: Sealed: false, HA Mode: standby


Phase 6: Verify HA Cluster

6.1 Check Cluster Status

export VAULT_ADDR="https://vault-01.inside.domusdigitalis.dev:8200"
vault operator raft list-peers

Expected output:

Node       Address                                                   State       Voter
----       -------                                                   -----       -----
vault-01   vault-01.inside.domusdigitalis.dev:8201                   leader      true
vault-02   vault-02.inside.domusdigitalis.dev:8201                   follower    true
vault-03   vault-03.inside.domusdigitalis.dev:8201                   follower    true

6.2 Test Failover

# Stop the leader
ssh vault-01 "sudo systemctl stop vault"

# Wait 10 seconds for election
sleep 10

# Check new leader (connect to vault-02 or vault-03)
export VAULT_ADDR="https://vault-02.inside.domusdigitalis.dev:8200"
vault operator raft list-peers
# Restart vault-01
ssh vault-01 "sudo systemctl start vault"

# Unseal (requires TTY for interactive prompt)
ssh -t vault-01 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault operator unseal"  # Key 1
ssh -t vault-01 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault operator unseal"  # Key 2
ssh -t vault-01 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault operator unseal"  # Key 3

# Verify it rejoined as follower
vault operator raft list-peers

Validation Checklist

echo "=== Vault HA Cluster Validation ===" && \
echo -e "\n--- Cluster Peers ---" && \
vault operator raft list-peers && \
echo -e "\n--- vault-01 Status ---" && \
ssh vault-01 "vault status | grep -E 'Sealed|HA'" && \
echo -e "\n--- vault-02 Status ---" && \
ssh vault-02 "vault status | grep -E 'Sealed|HA'" && \
echo -e "\n--- vault-03 Status ---" && \
ssh vault-03 "vault status | grep -E 'Sealed|HA'" && \
echo -e "\n--- PKI Test ---" && \
vault list pki_int/certs | head -5

Rollback Plan

If Migration Fails (Phase 1)

# Restore file storage backup
ssh vault-01 "sudo systemctl stop vault"
ssh vault-01 "sudo rm -rf /opt/vault/raft"
ssh vault-01 "sudo tar -xzvf /tmp/vault-file-backup-*.tar.gz -C /"
ssh vault-01 "sudo cp /etc/vault.d/vault.hcl.file-backup /etc/vault.d/vault.hcl"
ssh vault-01 "sudo systemctl start vault"

If Cluster Join Fails (Phase 5)

# vault-01 continues as single node (raft, no peers)
# Delete failed VMs
ssh kvm-02 "sudo virsh destroy vault-02; sudo virsh undefine vault-02 --remove-all-storage"
ssh kvm-02 "sudo virsh destroy vault-03; sudo virsh undefine vault-03 --remove-all-storage"