Certmgr Troubleshooting Runbook

Overview

certmgr-01.inside.domusdigitalis.dev (10.50.1.60) is the central certificate management server running Certbot with Let’s Encrypt DNS-01 challenges via Cloudflare API.

This is a Single Point of Failure for all Let’s Encrypt certificates. Consider HashiCorp Vault intermediate CA for internal services.
direction: right

certmgr-01 Issues: {
  shape: rectangle
  style.fill: "#2d2d2d"
  style.stroke: "#f5a623"

  Renewal Failures: {
    shape: rectangle
    style.fill: "#3a1a1a"
    style.stroke: "#ff6b6b"

    Cloudflare Path Wrong
    DNS NXDOMAIN
    API Token Expired
  }

  Deploy Hook Failures: {
    shape: rectangle
    style.fill: "#3a2a1a"
    style.stroke: "#ffb347"

    ISE API Unreachable
    iPSK SSH Failed
    Secrets Not Decrypting
  }

  Timer Issues: {
    shape: rectangle
    style.fill: "#1a3a5c"
    style.stroke: "#4a9eff"

    certbot-renew.timer Disabled
    Service Failed to Start
  }
}

Architecture Reference

  • Certbot Renewal Flow - Full renewal process with error states

  • PKI Hierarchy - Certificate authority chain

  • Infrastructure Overview - System topology

Current Certificates

# List all managed certificates
sudo certbot certificates
Table 1. Certificates as of 2026-01-24
Certificate Name Domains Days Left Status

9800-wlc-01.inside.domusdigitalis.dev

9800-wlc-01.inside.domusdigitalis.dev

~90

OK

guest.domusdigitalis.dev

guest.domusdigitalis.dev

~90

BROKEN - Path issue

ipmi-01.inside.domusdigitalis.dev

ipmi-01.inside.domusdigitalis.dev

~90

OK

kvm-01.inside.domusdigitalis.dev

kvm-01.inside.domusdigitalis.dev

~90

BROKEN - DNS NXDOMAIN

Known Issues and Fixes

CRITICAL: guest.domusdigitalis.dev - Wrong Cloudflare Path

Error Message
Attempting to renew cert (guest.domusdigitalis.dev) from /etc/letsencrypt/renewal/guest.domusdigitalis.dev.conf produced an unexpected error:
dns_cloudflare_credentials: /root/.secrets/cloudflare.ini: No such file or directory
Root Cause

The certificate was configured with credentials path /root/.secrets/cloudflare.ini but credentials are stored at /home/ansible/.secrets/cloudflare.ini.

Fix Steps
# 1. Check current config
sudo cat /etc/letsencrypt/renewal/guest.domusdigitalis.dev.conf | grep cloudflare

# 2. Edit the renewal config
sudo vim /etc/letsencrypt/renewal/guest.domusdigitalis.dev.conf

# 3. Change:
#    dns_cloudflare_credentials = /root/.secrets/cloudflare.ini
# To:
#    dns_cloudflare_credentials = /home/ansible/.secrets/cloudflare.ini

# 4. Test renewal
sudo certbot renew --cert-name guest.domusdigitalis.dev --dry-run

# 5. If successful, force renewal
sudo certbot renew --cert-name guest.domusdigitalis.dev --force-renewal
Prevention

Always use absolute path to ansible user’s secrets:

certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /home/ansible/.secrets/cloudflare.ini \
  -d guest.domusdigitalis.dev

CRITICAL: kvm-01.inside.domusdigitalis.dev - DNS NXDOMAIN

Error Message
Encountered NXDOMAIN when looking up TXT record: _acme-challenge.kvm-01.inside.domusdigitalis.dev
Root Cause

DNS TXT record _acme-challenge.kvm-01.inside.domusdigitalis.dev does not exist in Cloudflare zone, OR DNS propagation timing is too short.

Fix Option 1: Increase Propagation Time
# Edit renewal config
sudo vim /etc/letsencrypt/renewal/kvm-01.inside.domusdigitalis.dev.conf

# Add or increase:
dns_cloudflare_propagation_seconds = 60
Fix Option 2: Verify Cloudflare Zone Permissions
# Check API token permissions
# Token needs: Zone:DNS:Edit for domusdigitalis.dev

# List zones accessible by token
curl -X GET "https://api.cloudflare.com/client/v4/zones" \
  -H "Authorization: Bearer $(cat /home/ansible/.secrets/cloudflare.ini | grep api_token | cut -d= -f2 | tr -d ' ')" \
  -H "Content-Type: application/json"
Fix Option 3: Use Wildcard Instead
# Consider using wildcard for *.inside.domusdigitalis.dev
# This reduces individual cert management

certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /home/ansible/.secrets/cloudflare.ini \
  -d "*.inside.domusdigitalis.dev" \
  -d "inside.domusdigitalis.dev"

Deploy Hook Failures

Location
/etc/letsencrypt/renewal-hooks/deploy/deploy-certs.sh
Table 2. Common Issues
Issue Symptom Fix

ISE API unreachable

curl: (7) Failed to connect to ise-02.inside.domusdigitalis.dev

Check ISE is up, verify IP in secrets

age decryption failed

age: error: no identity matched

Verify age key in ~/.config/age/

iPSK SSH failed

ssh: connect to host ipsk-manager.inside.domusdigitalis.dev port 22: Connection refused

Check iPSK VM is running, SSH key authorized

Missing environment variable

ISE_API_TOKEN: unbound variable

Secrets not loaded, check age file path

Debug Deploy Hook
# Run deploy hook manually with debug
sudo bash -x /etc/letsencrypt/renewal-hooks/deploy/deploy-certs.sh

# Check what certificates triggered the hook
echo $RENEWED_LINEAGE
echo $RENEWED_DOMAINS

Systemd Timer Status

# Check timer status
systemctl status certbot-renew.timer

# Check recent runs
journalctl -u certbot-renew.service -n 50

# Force a renewal check
sudo systemctl start certbot-renew.service
Expected Timer Output
certbot-renew.timer - Run certbot twice daily
   Loaded: loaded (/lib/systemd/system/certbot-renew.timer; enabled)
   Active: active (waiting)
  Trigger: [next trigger time]

Full Renewal Test

# Dry-run all certificates
sudo certbot renew --dry-run

# Expected: All simulated renewals succeeded
# If failures, check each cert individually

Emergency Procedures

Certificate About to Expire

# Force renewal immediately
sudo certbot renew --force-renewal

# If renewal fails, issue new cert
sudo certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /home/ansible/.secrets/cloudflare.ini \
  -d [domain]

Deploy Hook Not Running

# Manually deploy after renewal
export RENEWED_LINEAGE="/etc/letsencrypt/live/[cert-name]"
export RENEWED_DOMAINS="[domain]"
sudo -E /etc/letsencrypt/renewal-hooks/deploy/deploy-certs.sh

ISE Guest Portal Cert Expired

# 1. Renew cert
sudo certbot renew --cert-name guest.domusdigitalis.dev --force-renewal

# 2. Manual deployment to ISE if hook fails
# Use netapi or direct API call
source <(dsec source d000 dev/network)
curl -sk -X POST "https://$<ise-pan-ip>/api/v1/certs/system-certificate/import" \
  -H "Authorization: Basic $<ise-api-token>" \
  -H "Content-Type: application/json" \
  -d @- << EOF
{
  "admin": false,
  "eap": false,
  "portal": true,
  "portalGroupTag": "Default Portal Certificate Group",
  "data": "$(base64 -w0 /etc/letsencrypt/live/guest.domusdigitalis.dev/fullchain.pem)",
  "privateKeyData": "$(base64 -w0 /etc/letsencrypt/live/guest.domusdigitalis.dev/privkey.pem)"
}
EOF

Long-Term Solutions

Priority 1: Fix Current Failures

  • Fix guest.domusdigitalis.dev cloudflare path

  • Fix kvm-01.inside.domusdigitalis.dev DNS propagation/NXDOMAIN

  • Verify all certs renew with --dry-run

Priority 2: Reduce Let’s Encrypt Dependency

  • Deploy HashiCorp Vault as intermediate CA

  • Issue internal certs from Vault (ISE admin, EAP-TLS)

  • Keep Let’s Encrypt only for public-facing (guest portal)

Priority 3: Redundancy

  • Backup /etc/letsencrypt to nas-01.inside.domusdigitalis.dev

  • Document manual cert issuance procedures

  • Consider secondary certbot server