Vault SSH CA Architecture

Executive Summary

This document presents an enterprise SSH authentication architecture using HashiCorp Vault as a Certificate Authority (CA). This approach eliminates SSH key sprawl, provides centralized access control, and enables cryptographic audit trails.

Key Benefits:

  • Zero static keys - No more managing authorized_keys files

  • Short-lived credentials - 8-hour certificates expire automatically

  • Centralized control - Revoke access instantly from Vault

  • Audit trail - Every certificate issuance is logged

  • Principal-based access - Users get access based on identity, not key distribution

Vault SSH CA Architecture - Problem vs Solution
Figure 1. SSH CA Architecture Overview

The Problem with Traditional SSH Keys

Current State (Typical Enterprise)

Traditional SSH Key Sprawl
Figure 2. Traditional SSH Key Management - The Chaos

Problems

Issue Impact

Key sprawl

Thousands of keys across hundreds of servers

No expiration

Keys valid forever until manually removed

No central revocation

Terminated employee? Hope you updated every server

No audit trail

Who added that key? When? Why?

Inconsistent access

Different keys on different servers

Manual distribution

Each new server requires key deployment

Rotation impossible

Nobody rotates keys when it requires touching every server

The Solution: Vault SSH CA

Architecture Overview

Vault SSH CA Architecture
Figure 3. Vault SSH CA Flow

How It Works

  1. One-time setup: Each server trusts the Vault CA public key

  2. User requests cert: vault write ssh/sign/role public_key=@~/.ssh/id_ed25519.pub

  3. Vault signs cert: Returns certificate with 8-hour validity

  4. User SSHs: Certificate authenticates against trusted CA

  5. Cert expires: No cleanup needed - it’s just invalid

Key Components

Component Location Purpose

Vault SSH CA

vault-01:8200

Signs user public keys, issues certificates

CA Public Key

/etc/ssh/vault-ca.pub (on each server)

Servers trust certs signed by this key

User Certificate

~/.ssh/id_ed25519-cert.pub

Short-lived credential (8h TTL)

Principals

Embedded in cert

Which usernames the cert allows

Security Model

Certificate Anatomy

$ ssh-keygen -Lf ~/.ssh/id_ed25519_vault-cert.pub

Type: ssh-ed25519-cert-v01@openssh.com user certificate
Public key: ED25519-CERT SHA256:xxx
Signing CA: ED25519 SHA256:yyy (using ed25519)
Key ID: "vault-root-4baadeb7d7367dd0485e..."
Serial: 12345678901234567890
Valid: from 2026-02-22T12:30:41 to 2026-02-22T20:31:11
Principals:
        ansible
        evanusmodestus
        root
Critical Options: (none)
Extensions:
        permit-pty

What the Certificate Contains

Field Purpose

Key ID

Unique identifier for this cert (appears in auth logs)

Serial

Sequential number for revocation tracking

Valid

Start and end time (8-hour window)

Principals

Allowed usernames (e.g., ansible, root, username)

Extensions

Capabilities (permit-pty, permit-port-forwarding, etc.)

Why 8-Hour TTL?

Duration Trade-off

Minutes

Too short - constant re-authentication

8 hours

Covers a work day, expires overnight

24 hours

Reasonable but allows after-hours access

Days/Weeks

Defeats the purpose - just use static keys

Implementation Guide

Phase 1: Enable Vault SSH Engine

# Enable SSH secrets engine
vault secrets enable ssh

# Generate CA key pair
vault write ssh/config/ca generate_signing_key=true

# Get CA public key (distribute to all servers)
vault read -field=public_key ssh/config/ca > vault-ca.pub

Phase 2: Create Signing Role

vault write ssh/roles/enterprise-client \
  key_type=ca \
  allow_user_certificates=true \
  allowed_users="*" \
  default_user="ansible" \
  valid_principals="ansible,root,svc_deploy" \
  ttl=8h \
  max_ttl=24h \
  default_extensions='{"permit-pty": "", "permit-port-forwarding": ""}'

Phase 3: Configure Servers

On each server:

# Copy CA public key
scp vault-ca.pub server:/etc/ssh/

# Configure sshd
echo "TrustedUserCAKeys /etc/ssh/vault-ca.pub" >> /etc/ssh/sshd_config

# Restart sshd
systemctl restart sshd

Phase 4: User Workflow

# Sign public key (get certificate)
vault write -field=signed_key ssh/sign/enterprise-client \
  public_key=@~/.ssh/id_ed25519.pub \
  valid_principals="ansible,username,root" \
  > ~/.ssh/id_ed25519-cert.pub

# Add to SSH agent
ssh-add ~/.ssh/id_ed25519

# SSH to any trusted server
ssh ansible@server1.example.com

Automation

vault-ssh-sign Script

#!/bin/bash
# vault-ssh-sign - Renew SSH certificate from Vault

set -euo pipefail

PUBKEY="${SSH_PUBKEY:-$HOME/.ssh/id_ed25519_vault.pub}"
CERT="${PUBKEY%.pub}-cert.pub"
PRINCIPALS="${SSH_PRINCIPALS:-ansible,$(whoami),root}"

# Sign the key
vault write -field=signed_key ssh/sign/domus-client \
  public_key=@"$PUBKEY" \
  valid_principals="$PRINCIPALS" \
  > "$CERT"

# Reload SSH agent
ssh-add -d "${PUBKEY%.pub}" 2>/dev/null || true
ssh-add "${PUBKEY%.pub}"

# Show validity
ssh-keygen -Lf "$CERT" | grep -E "Valid:|Extensions:"

Daily Workflow

# Morning: Load Vault credentials and sign cert
dsource d000 dev/vault    # Load VAULT_ADDR, VAULT_TOKEN
vault-ssh-sign            # Get 8h cert

# Work day: SSH anywhere
ssh server1
ssh server2
ssh server3

# End of day: Cert expires automatically
# No cleanup needed

Enterprise Integration

LDAP/AD Integration

Vault can authenticate users against Active Directory:

# Enable LDAP auth
vault auth enable ldap

# Configure AD connection
vault write auth/ldap/config \
  url="ldaps://dc01.example.com" \
  userdn="OU=Users,DC=example,DC=com" \
  groupdn="OU=Groups,DC=example,DC=com" \
  binddn="CN=vault-svc,OU=Service Accounts,DC=example,DC=com" \
  bindpass="$AD_BIND_PASSWORD" \
  certificate=@/etc/ssl/certs/ad-ca.pem

# Map AD group to Vault policy
vault write auth/ldap/groups/linux-admins policies=ssh-signer

Principal Mapping

Map AD groups to SSH principals:

AD Group SSH Principal Access Level

linux-admins

root, ansible

Full administrative

linux-users

username

Standard user

svc-deploy

ansible, svc_deploy

Deployment only

Kubernetes Integration

For service accounts in Kubernetes:

# Enable Kubernetes auth
vault auth enable kubernetes

# Configure (from within k3s)
vault write auth/kubernetes/config \
  kubernetes_host="https://kubernetes.default.svc" \
  kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt

# Create role for CI/CD
vault write auth/kubernetes/role/ci-deploy \
  bound_service_account_names=ci-runner \
  bound_service_account_namespaces=ci \
  policies=ssh-deploy \
  ttl=1h

Monitoring & Audit

Vault Audit Log

Enable audit logging:

vault audit enable file file_path=/var/log/vault/audit.log

Every certificate issuance is logged:

{
  "time": "2026-02-22T12:30:41.123Z",
  "type": "response",
  "auth": {
    "token_type": "service",
    "policies": ["ssh-signer"]
  },
  "request": {
    "path": "ssh/sign/enterprise-client",
    "data": {
      "public_key": "ssh-ed25519 AAAA...",
      "valid_principals": "ansible,evanusmodestus,root"
    }
  },
  "response": {
    "data": {
      "serial_number": "12345678901234567890"
    }
  }
}

Server Auth Logs

SSH auth logs show certificate details:

sshd[1234]: Accepted publickey for ansible from 10.0.0.50
  port 54321 ssh2: ED25519-CERT SHA256:xxx
  ID vault-root-4baadeb7 (serial 12345678901234567890)
  CA ED25519 SHA256:yyy
SSH CA Authentication Handshake
Figure 4. SSH CA Handshake Flow

Advanced Log Parsing

Extract certificate authentication details with awk:

# Parse SSH cert auths with full details
sudo journalctl -u sshd --since "1 hour ago" | awk '
/Accepted publickey/ && /CERT/ {
  match($0, /from ([0-9.]+)/, ip)
  match($0, /for ([a-z]+)/, user)
  match($0, /ID ([a-zA-Z0-9-]+)/, certid)
  match($0, /serial ([0-9]+)/, serial)
  match($0, /CA [A-Z0-9]+ SHA256:([a-zA-Z0-9+\/]+)/, ca)
  printf "%s | User: %-15s | From: %-15s | Serial: %s | CA: %s\n", \
    $1" "$2" "$3, user[1], ip[1], serial[1], substr(ca[1],1,12)"..."
}'

Track unique certificates over time:

# Count unique serials (each = one vault-ssh-sign)
sudo journalctl -u sshd --since "24 hours ago" | \
  awk '/serial [0-9]+/{match($0,/serial ([0-9]+)/,s); print s[1]}' | \
  sort -u | wc -l

Metrics

Vault exposes Prometheus metrics:

# Certificate issuance rate
vault_secret_kv_count{mount_point="ssh"}

# Auth success/failure
vault_core_handle_request{error=""}

Disaster Recovery

CA Key Backup

The CA private key is critical:

# Export CA key (encrypt at rest!)
vault read ssh/config/ca -format=json > vault-ssh-ca-backup.json

# Store in offline backup (M-Disc, safe deposit box)

CA Key Rotation

Rotate CA key without downtime:

  1. Generate new CA key on Vault

  2. Add new CA public key to servers (alongside old)

  3. Users get certs signed by new CA

  4. Remove old CA key from servers after all certs expire

Emergency Revocation

If a certificate is compromised:

# Option 1: Short TTL means wait 8 hours
# Option 2: Add serial to revocation list
echo "12345678901234567890" >> /etc/ssh/revoked-keys

# In sshd_config:
RevokedKeys /etc/ssh/revoked-keys

Comparison: Traditional vs Vault SSH CA

Aspect Traditional Keys Vault SSH CA

Key Distribution

Manual copy to each server

One-time CA trust setup

Revocation

Edit authorized_keys everywhere

Wait for TTL or add to revocation list

Audit

grep authorized_keys

Centralized Vault audit log

Onboarding

Add key to N servers

Grant Vault policy

Offboarding

Remove key from N servers

Revoke Vault access

Key Rotation

Impossible at scale

Automatic (8h TTL)

Compliance

Manual documentation

Cryptographic audit trail

Implementation Checklist

Prerequisites

  • HashiCorp Vault cluster (HA recommended)

  • Vault unsealed and accessible

  • TLS configured for Vault API

  • Audit logging enabled

Server Configuration

  • CA public key distributed

  • TrustedUserCAKeys configured

  • sshd restarted

  • Existing authorized_keys removed (optional)

User Configuration

  • vault-ssh-sign script deployed

  • SSH config updated for cert usage

  • Workflow documented

Monitoring

  • Vault audit logs forwarded to SIEM

  • Certificate issuance alerts configured

  • Expiring cert notifications

Quick Reference

Daily Commands

# Sign certificate
vault-ssh-sign

# Check cert validity
ssh-keygen -Lf ~/.ssh/id_ed25519_vault-cert.pub | grep Valid

# Reload cert in agent
ssh-add -d ~/.ssh/id_ed25519_vault && ssh-add ~/.ssh/id_ed25519_vault

Troubleshooting

Error Fix

"Certificate invalid: not yet valid"

NTP sync issue - check time on client and server

"no matching key found"

Cert expired - re-run vault-ssh-sign

"Permission denied"

Check principals in cert match target user

Document Information

Author

Evan Modestus

Version

1.0

Last Updated

2026-02-22

Purpose

CHLA Cloud Engineering Proposal