Vault Enterprise Hardening Roadmap

🎯 PRIORITY: P1 (HIGH)

Current State: Single-node Vault on vault-01 with enterprise PKI (DOMUS-ROOT-CA + DOMUS-ISSUING-CA)

Problem: Single point of failure. No audit logging. Using root token.

Target: 3-node HA cluster with Raft, audit logging, least-privilege policies, multiple auth methods

Complete Infrastructure Overview

Vault is the secrets backbone for this entire ecosystem. Every service shown here will integrate with Vault for certificates, secrets, and dynamic credentials.

Domus Digitalis Complete Infrastructure - Hub and Spoke Architecture

Architecture Vision

Vault Enterprise Architecture - Current vs Target State

Current State Assessment

βœ…

Two-Tier PKI

DOMUS-ROOT-CA (offline) + DOMUS-ISSUING-CA (online)

βœ…

Certificate Issuance

EAP-TLS certs for 802.1X working

❌

High Availability

Single node = single point of failure

βœ…

Audit Logging

/var/log/vault/audit.log enabled (2026-02-21)

βœ…

Policies

pki-issuer, kv-reader, admin, ssh-client

βœ…

Auth Methods

AppRole (netapi, ssh-user) - LDAP/OIDC planned

Target Topology

Vault HA Cluster Topology with Raft
Host Current VMs Target VMs

Supermicro A

vault-01, bind-01, k3s-01

vault-01 (leader), bind-01, k3s-01

Supermicro B

(planned)

vault-02, vault-03, bind-02, k3s-02

Raft Quorum: 3 nodes = survives 1 node failure

Policy & Auth Flow

Vault Authentication and Policy Flow

Kubernetes Identity Integration

Once Vault HA is operational, it becomes the secrets backbone for container workloads. This diagram shows how your existing infrastructure (AD, Keycloak, Vault, ISE) powers Kubernetes:

Kubernetes Identity Integration - Zero-Trust Container Security

Key Integration Points:

  • Active Directory β†’ Source of truth for users and groups

  • Keycloak β†’ OIDC broker for k8s API and application SSO

  • Vault β†’ Secrets injection via Vault Agent sidecar (no hardcoded credentials)

  • ISE β†’ 802.1X authentication for k3s nodes at the network layer


1
Audit Logging
TODAY β€’ Required for compliance

Vault will STOP serving requests if it cannot write to audit log. Ensure path is reliable.

# Create audit directory
sudo mkdir -p /var/log/vault
sudo chown vault:vault /var/log/vault
# Enable file audit
vault audit enable file file_path=/var/log/vault/audit.log
# Verify
vault audit list
Test audit logging
vault secrets list
sudo tail -1 /var/log/vault/audit.log | jq '.request.path'

2
Least-Privilege Policies
TODAY β€’ Stop using root token

2.1 PKI Issuer Policy

For netapi and automation to issue certificates:

cat > /tmp/pki-issuer.hcl << 'EOF'
path "pki_int/issue/*" {
  capabilities = ["create", "update"]
}

path "pki_int/certs" {
  capabilities = ["list"]
}

path "pki_int/cert/*" {
  capabilities = ["read"]
}

path "pki_int/ca/pem" {
  capabilities = ["read"]
}

path "pki_int/ca_chain" {
  capabilities = ["read"]
}
EOF

vault policy write pki-issuer /tmp/pki-issuer.hcl

2.2 KV Reader Policy

Read-only secrets access:

cat > /tmp/kv-reader.hcl << 'EOF'
path "kv/data/domus/*" {
  capabilities = ["read", "list"]
}

path "kv/metadata/domus/*" {
  capabilities = ["list"]
}
EOF

vault policy write kv-reader /tmp/kv-reader.hcl

2.3 Admin Policy

Full access (use sparingly):

cat > /tmp/admin.hcl << 'EOF'
path "*" {
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
EOF

vault policy write admin /tmp/admin.hcl

2.4 Verify Policies

vault policy list
Expected Output
admin
default
kv-reader
pki-issuer
root
ssh-client

3
KV Secrets Engine
THIS WEEK β€’ Complements gopass/dsec

Use Cases:

  • API keys that automation needs

  • Service account passwords

  • Shared team secrets

  • Secrets that need versioning/rotation

# Enable KV v2
vault secrets enable -path=kv kv-v2
# Create namespace structure
vault kv put kv/domus/infrastructure/placeholder initialized=true
vault kv put kv/domus/automation/placeholder initialized=true
vault kv put kv/domus/certificates/placeholder initialized=true

4
Auth Methods
THIS WEEK β€’ Multiple auth for different use cases

4.1 AppRole for Automation

For netapi, dsec, CI/CD pipelines:

vault auth enable approle
vault write auth/approle/role/netapi \
    token_policies="pki-issuer" \
    token_ttl=1h \
    token_max_ttl=4h \
    secret_id_ttl=0
# Get role ID
vault read auth/approle/role/netapi/role-id
# Generate secret ID (store securely!)
vault write -f auth/approle/role/netapi/secret-id

4.2 LDAP Auth (Future)

vault auth enable ldap
vault write auth/ldap/config \
    url="ldaps://home-dc01.inside.domusdigitalis.dev" \
    userdn="CN=Users,DC=inside,DC=domusdigitalis,DC=dev" \
    groupdn="CN=Users,DC=inside,DC=domusdigitalis,DC=dev" \
    binddn="CN=vault-ldap,CN=Users,DC=inside,DC=domusdigitalis,DC=dev" \
    bindpass="<ldap-bind-password>" \
    certificate=@/etc/ssl/certs/ca-certificates.crt

4.3 OIDC via Keycloak (Future)

vault auth enable oidc
vault write auth/oidc/config \
    oidc_discovery_url="https://keycloak-01.inside.domusdigitalis.dev:8443/realms/domusdigitalis" \
    oidc_client_id="vault" \
    oidc_client_secret="<client-secret>" \
    default_role="reader"

5
High Availability Cluster
WEEKEND β€’ 3-node Raft cluster

Requires planning and downtime. Back up Vault data directory first!

5.1 Prerequisites

  • Vault External TLS configured on current node

  • Supermicro Host B operational (or Synology NFS for initial deployment)

  • 3 VMs: vault-01, vault-02, vault-03

  • DNS records configured

  • Firewall: 8200/tcp (API), 8201/tcp (Raft)

  • TLS certs from DOMUS-ISSUING-CA for each node

  • Fileβ†’Raft migration completed (see 5.2 below)

5.2 File to Raft Storage Migration

If vault-01 currently uses storage "file", you MUST migrate to Raft before adding nodes. A 2-node Raft cluster is WORSE than 1 node (requires both up for quorum).

5.2.1 Check Current Storage Backend

ssh vault-01 "grep -A3 'storage' /etc/vault.d/vault.hcl"

If output shows storage "file", proceed with migration. If storage "raft", skip to 5.3.

5.2.2 Backup Current Vault Data

# Stop Vault service
ssh vault-01 "sudo systemctl stop vault"
# Create backup of file storage
ssh vault-01 "sudo tar -czvf /tmp/vault-file-backup-$(date +%Y%m%d).tar.gz /opt/vault/data"
# Copy backup to NAS (belt and suspenders)
ssh vault-01 "sudo cp /tmp/vault-file-backup-*.tar.gz /mnt/nas/backups/vault/"

5.2.3 Create Migration Configuration

ssh vault-01 "sudo cat > /etc/vault.d/migrate.hcl << 'EOF'
storage_source \"file\" {
  path = \"/opt/vault/data\"
}

storage_destination \"raft\" {
  path    = \"/opt/vault/raft\"
  node_id = \"vault-01\"
}
EOF"

5.2.4 Prepare Raft Directory

ssh vault-01 "sudo mkdir -p /opt/vault/raft && sudo chown vault:vault /opt/vault/raft"

5.2.5 Run Migration

ssh vault-01 "sudo -u vault vault operator migrate -config=/etc/vault.d/migrate.hcl"
Expected Output
2026/02/24 14:30:00 [INFO] copied key: core/...
2026/02/24 14:30:00 [INFO] copied key: logical/...
...
Success! All data migrated.

5.2.6 Update vault.hcl for Raft

Backup existing config:

ssh vault-01 "sudo cp /etc/vault.d/vault.hcl /etc/vault.d/vault.hcl.file-backup"

Create new Raft-enabled config:

ssh vault-01 "sudo cat > /etc/vault.d/vault.hcl << 'EOF'
# Vault Configuration - Raft HA Cluster
# Migrated from file storage on $(date +%Y-%m-%d)

ui = true
disable_mlock = true

# Raft Integrated Storage (HA-ready)
storage \"raft\" {
  path    = \"/opt/vault/raft\"
  node_id = \"vault-01\"

  # Retry join for HA (vault-02/03 will join this node)
  retry_join {
    leader_api_addr = \"https://vault-02.inside.domusdigitalis.dev:8200\"
  }
  retry_join {
    leader_api_addr = \"https://vault-03.inside.domusdigitalis.dev:8200\"
  }
}

# HTTPS listener
listener \"tcp\" {
  address       = \"0.0.0.0:8200\"
  tls_cert_file = \"/opt/vault/tls/vault.crt\"
  tls_key_file  = \"/opt/vault/tls/vault.key\"
}

# Cluster communication
cluster_addr = \"https://vault-01.inside.domusdigitalis.dev:8201\"
api_addr     = \"https://vault-01.inside.domusdigitalis.dev:8200\"

# Audit logging
# Enable with: vault audit enable file file_path=/var/log/vault/audit.log
EOF"

5.2.7 Start Vault and Unseal

ssh vault-01 "sudo systemctl start vault"
# Check status (will be sealed)
ssh vault-01 "vault status"
# Unseal (requires 2 of 3 keys from dsec d000 dev/vault)
ssh vault-01 "vault operator unseal"  # Enter key 1
ssh vault-01 "vault operator unseal"  # Enter key 2

5.2.8 Verify Migration

# Check storage backend
ssh vault-01 "vault status | grep -E 'Storage|HA'"
Expected Output
Storage Type          raft
HA Enabled            true
HA Cluster            https://vault-01.inside.domusdigitalis.dev:8201
HA Mode               active
# Verify data integrity - list PKI certs
ssh vault-01 "vault list pki_int/certs"
# Verify SSH CA
ssh vault-01 "vault read ssh/config/ca"

5.2.9 Cleanup

# Remove migration config
ssh vault-01 "sudo rm /etc/vault.d/migrate.hcl"
# Keep old file storage for 7 days, then remove
# ssh vault-01 "sudo rm -rf /opt/vault/data"  # After verification period

5.3 Deploy vault-02 and vault-03

Deploy two new VMs for HA. Use Rocky Linux 9 (same as vault-01).

5.3.1 VM Deployment Options

Option Storage Location Failure Domain

Recommended

vault-01 on kvm-01 SSD, vault-02 on kvm-02 SSD, vault-03 on NAS

Survives single host or NAS failure

Initial (today)

All 3 on Synology NAS NFS

NAS = SPOF, but gets HA running

Future

Move vault-02 to kvm-02 local SSD when available

Proper failure domains

5.3.2 Create vault-02 and vault-03 VMs

Use same cloud-init pattern as k3s-master-01 (see k3s Deployment).

# On kvm-01 or kvm-02
for NODE in vault-02 vault-03; do
  IP_SUFFIX=$([[ "$NODE" == "vault-02" ]] && echo "61" || echo "62")

  # Create cloud-init
  cat > /tmp/${NODE}-cloud-init.yml << EOF
#cloud-config
hostname: ${NODE}
fqdn: ${NODE}.inside.domusdigitalis.dev
users:
  - name: ansible
    groups: wheel
    sudo: ALL=(ALL) NOPASSWD:ALL
    ssh_authorized_keys:
      - $(cat ~/.ssh/id_ed25519.pub)
runcmd:
  - dnf install -y vault
EOF

  # Create VM (adjust paths for your storage)
  # ... (use virt-install pattern from k3s-deployment)
done

5.3.3 Install Vault on New Nodes

for NODE in vault-02 vault-03; do
  ssh $NODE "sudo dnf install -y dnf-plugins-core && \
    sudo dnf config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo && \
    sudo dnf install -y vault"
done

5.3.4 Configure vault-02.hcl

ssh vault-02 "sudo cat > /etc/vault.d/vault.hcl << 'EOF'
ui = true
disable_mlock = true

storage \"raft\" {
  path    = \"/opt/vault/raft\"
  node_id = \"vault-02\"

  retry_join {
    leader_api_addr = \"https://vault-01.inside.domusdigitalis.dev:8200\"
  }
  retry_join {
    leader_api_addr = \"https://vault-03.inside.domusdigitalis.dev:8200\"
  }
}

listener \"tcp\" {
  address       = \"0.0.0.0:8200\"
  tls_cert_file = \"/opt/vault/tls/vault.crt\"
  tls_key_file  = \"/opt/vault/tls/vault.key\"
}

cluster_addr = \"https://vault-02.inside.domusdigitalis.dev:8201\"
api_addr     = \"https://vault-02.inside.domusdigitalis.dev:8200\"
EOF"

5.3.5 Configure vault-03.hcl

ssh vault-03 "sudo cat > /etc/vault.d/vault.hcl << 'EOF'
ui = true
disable_mlock = true

storage \"raft\" {
  path    = \"/opt/vault/raft\"
  node_id = \"vault-03\"

  retry_join {
    leader_api_addr = \"https://vault-01.inside.domusdigitalis.dev:8200\"
  }
  retry_join {
    leader_api_addr = \"https://vault-02.inside.domusdigitalis.dev:8200\"
  }
}

listener \"tcp\" {
  address       = \"0.0.0.0:8200\"
  tls_cert_file = \"/opt/vault/tls/vault.crt\"
  tls_key_file  = \"/opt/vault/tls/vault.key\"
}

cluster_addr = \"https://vault-03.inside.domusdigitalis.dev:8201\"
api_addr     = \"https://vault-03.inside.domusdigitalis.dev:8200\"
EOF"

5.4 TLS Certificates for New Nodes

Issue certs from Vault PKI for vault-02 and vault-03.

for NODE in vault-02 vault-03; do
  # Issue cert
  vault write -format=json pki_int/issue/domus-server \
    common_name="${NODE}.inside.domusdigitalis.dev" \
    alt_names="${NODE}" \
    ttl="8760h" > /tmp/${NODE}-cert.json

  # Extract
  jq -r '.data.certificate' /tmp/${NODE}-cert.json > /tmp/${NODE}.crt
  jq -r '.data.private_key' /tmp/${NODE}-cert.json > /tmp/${NODE}.key
  jq -r '.data.ca_chain[]' /tmp/${NODE}-cert.json > /tmp/${NODE}-chain.crt

  # Deploy
  ssh $NODE "sudo mkdir -p /opt/vault/tls"
  scp /tmp/${NODE}.crt ${NODE}:/tmp/
  scp /tmp/${NODE}.key ${NODE}:/tmp/
  scp /tmp/${NODE}-chain.crt ${NODE}:/tmp/

  ssh $NODE "sudo mv /tmp/${NODE}.crt /opt/vault/tls/vault.crt && \
    sudo mv /tmp/${NODE}.key /opt/vault/tls/vault.key && \
    sudo cat /tmp/${NODE}-chain.crt >> /opt/vault/tls/vault.crt && \
    sudo chown vault:vault /opt/vault/tls/* && \
    sudo chmod 600 /opt/vault/tls/vault.key"
done

5.5 Join Cluster

5.5.1 Start Vault on New Nodes

for NODE in vault-02 vault-03; do
  ssh $NODE "sudo mkdir -p /opt/vault/raft && sudo chown vault:vault /opt/vault/raft"
  ssh $NODE "sudo systemctl enable --now vault"
done

5.5.2 Join to Leader (vault-01)

# On vault-02
ssh vault-02 "VAULT_ADDR='https://vault-02.inside.domusdigitalis.dev:8200' vault operator raft join https://vault-01.inside.domusdigitalis.dev:8200"
# On vault-03
ssh vault-03 "VAULT_ADDR='https://vault-03.inside.domusdigitalis.dev:8200' vault operator raft join https://vault-01.inside.domusdigitalis.dev:8200"

5.5.3 Unseal New Nodes

# Unseal vault-02 (same keys as vault-01)
ssh vault-02 "vault operator unseal"  # Key 1
ssh vault-02 "vault operator unseal"  # Key 2
# Unseal vault-03
ssh vault-03 "vault operator unseal"  # Key 1
ssh vault-03 "vault operator unseal"  # Key 2

5.6 Verify HA Cluster

# List Raft peers (run from any node)
vault operator raft list-peers
Expected Output
Node        Address                                          State       Voter
----        -------                                          -----       -----
vault-01    vault-01.inside.domusdigitalis.dev:8201         leader      true
vault-02    vault-02.inside.domusdigitalis.dev:8201         follower    true
vault-03    vault-03.inside.domusdigitalis.dev:8201         follower    true
# Check HA status from each node
for NODE in vault-01 vault-02 vault-03; do
  echo "=== $NODE ==="
  ssh $NODE "vault status | grep -E 'HA|Storage'"
done

5.7 DNS Load Balancing (Optional)

For client HA, add DNS round-robin or use a load balancer.

# Add vault.inside.domusdigitalis.dev with round-robin A records via BIND
ssh bind-01 "sudo nsupdate -l << 'EOF'
zone inside.domusdigitalis.dev
update add vault.inside.domusdigitalis.dev. 300 A 10.50.1.60
update add vault.inside.domusdigitalis.dev. 300 A 10.50.1.61
update add vault.inside.domusdigitalis.dev. 300 A 10.50.1.62
send
EOF"
# Low TTL (300s) for faster failover
# For true LB, use HAProxy or keepalived VIP

5.8 Verify Failover

Test that cluster survives node failure:

# Stop the leader
ssh vault-01 "sudo systemctl stop vault"
# Check new leader election (wait 10s)
sleep 10
ssh vault-02 "vault operator raft list-peers"
# Verify operations still work
vault list pki_int/certs
# Restart vault-01
ssh vault-01 "sudo systemctl start vault"
ssh vault-01 "vault operator unseal"  # Key 1
ssh vault-01 "vault operator unseal"  # Key 2

6
Auto-Unseal
FUTURE β€’ No manual unseal after restart

Current State: Shamir Seal (Manual)

Vault uses Shamir’s Secret Sharing (invented by Adi Shamir, 1979 - the "S" in RSA). The master key is split into N shares, requiring K threshold shares to reconstruct.

Parameter Value

Total shares

3

Threshold

2

Storage

dsec (d000/dev/vault)

Risk: After VM restart, power outage, or Vault service restart - Vault remains SEALED until 2 operators provide unseal keys. All PKI, secrets, and SSH CA operations fail.

Monitoring Sealed State

Add to monitoring stack (Prometheus/Zabbix):

# Check seal status (0 = unsealed, 1 = sealed)
curl -s http://127.0.0.1:8200/v1/sys/health | jq '.sealed'
# One-liner for cron alerting
vault status -format=json | jq -e '.sealed == false' > /dev/null || echo "VAULT SEALED" | mail -s "ALERT: Vault Sealed" admin@example.com

Auto-Unseal Options

Method Use Case Cost

AWS KMS

Cloud deployments

~$1/month

Azure Key Vault

Azure deployments

~$0.03/10k ops

GCP Cloud KMS

GCP deployments

~$0.03/10k ops

Transit Engine

Self-hosted (another Vault)

Infrastructure only

HSM (PKCS#11)

Hardware Security Module

$$$

For homelab/enterprise: Deploy a minimal "seal Vault" on separate infrastructure:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Primary Vault     β”‚      β”‚    Seal Vault       β”‚
β”‚   (vault-01)      │◄────►│   (nas-01 container)β”‚
β”‚                     β”‚      β”‚                     β”‚
β”‚  - PKI              β”‚      β”‚  - Transit only     β”‚
β”‚  - KV secrets       β”‚      β”‚  - Auto-unseal key  β”‚
β”‚  - SSH CA           β”‚      β”‚  - Minimal surface  β”‚
β”‚  - Auto-unseals     β”‚      β”‚  - Different host   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation: Phase 6 of this roadmap (future work).


Quick Reference

Daily Operations

# Login with AppRole
vault write auth/approle/login role_id=<role-id> secret_id=<secret-id>
# Issue certificate
vault write pki_int/issue/domus-client-users common_name="hostname.inside.domusdigitalis.dev"
# Read a secret
vault kv get kv/domus/infrastructure/ise

Monitoring

# Check audit logs
sudo grep -i "error" /var/log/vault/audit.log | jq '.error'
# Check HA status
vault status
vault operator raft list-peers

Emergency

# Manual unseal
vault operator unseal <key-1>
vault operator unseal <key-2>
vault operator unseal <key-3>
  • PKI Strategy runbook (infra-ops)

  • DOMUS PKI Key Ceremony (infra-ops)

  • dsec Integration (secrets-infrastructure)

Revision History

Version Date Changes

1.4

2026-02-24

Phase 5 overhaul: Added file→raft migration (5.2), VM deployment (5.3), TLS certs (5.4), cluster join (5.5), verification (5.6), failover testing (5.8). Root cause: Original Phase 5 assumed Raft storage, but vault-01 uses file storage.

1.3

2026-02-18

Added Complete Infrastructure Overview with radial diagram

1.2

2026-02-18

Added Kubernetes Identity Integration section with diagram

1.1

2026-02-18

Visual redesign with phase cards

1.0

2026-02-17

Initial roadmap with 6 phases