Certmgr Troubleshooting Runbook
Overview
certmgr-01.inside.domusdigitalis.dev (10.50.1.60) is the central certificate management server running Certbot with Let’s Encrypt DNS-01 challenges via Cloudflare API.
| This is a Single Point of Failure for all Let’s Encrypt certificates. Consider HashiCorp Vault intermediate CA for internal services. |
direction: right
certmgr-01 Issues: {
shape: rectangle
style.fill: "#2d2d2d"
style.stroke: "#f5a623"
Renewal Failures: {
shape: rectangle
style.fill: "#3a1a1a"
style.stroke: "#ff6b6b"
Cloudflare Path Wrong
DNS NXDOMAIN
API Token Expired
}
Deploy Hook Failures: {
shape: rectangle
style.fill: "#3a2a1a"
style.stroke: "#ffb347"
ISE API Unreachable
iPSK SSH Failed
Secrets Not Decrypting
}
Timer Issues: {
shape: rectangle
style.fill: "#1a3a5c"
style.stroke: "#4a9eff"
certbot-renew.timer Disabled
Service Failed to Start
}
}
Architecture Reference
-
Certbot Renewal Flow - Full renewal process with error states
-
PKI Hierarchy - Certificate authority chain
-
Infrastructure Overview - System topology
Current Certificates
# List all managed certificates
sudo certbot certificates
| Certificate Name | Domains | Days Left | Status |
|---|---|---|---|
9800-wlc-01.inside.domusdigitalis.dev |
9800-wlc-01.inside.domusdigitalis.dev |
~90 |
OK |
guest.domusdigitalis.dev |
guest.domusdigitalis.dev |
~90 |
BROKEN - Path issue |
ipmi-01.inside.domusdigitalis.dev |
ipmi-01.inside.domusdigitalis.dev |
~90 |
OK |
kvm-01.inside.domusdigitalis.dev |
kvm-01.inside.domusdigitalis.dev |
~90 |
BROKEN - DNS NXDOMAIN |
Known Issues and Fixes
CRITICAL: guest.domusdigitalis.dev - Wrong Cloudflare Path
Attempting to renew cert (guest.domusdigitalis.dev) from /etc/letsencrypt/renewal/guest.domusdigitalis.dev.conf produced an unexpected error:
dns_cloudflare_credentials: /root/.secrets/cloudflare.ini: No such file or directory
The certificate was configured with credentials path /root/.secrets/cloudflare.ini but credentials are stored at /home/ansible/.secrets/cloudflare.ini.
# 1. Check current config
sudo cat /etc/letsencrypt/renewal/guest.domusdigitalis.dev.conf | grep cloudflare
# 2. Edit the renewal config
sudo vim /etc/letsencrypt/renewal/guest.domusdigitalis.dev.conf
# 3. Change:
# dns_cloudflare_credentials = /root/.secrets/cloudflare.ini
# To:
# dns_cloudflare_credentials = /home/ansible/.secrets/cloudflare.ini
# 4. Test renewal
sudo certbot renew --cert-name guest.domusdigitalis.dev --dry-run
# 5. If successful, force renewal
sudo certbot renew --cert-name guest.domusdigitalis.dev --force-renewal
Always use absolute path to ansible user’s secrets:
certbot certonly \
--dns-cloudflare \
--dns-cloudflare-credentials /home/ansible/.secrets/cloudflare.ini \
-d guest.domusdigitalis.dev
CRITICAL: kvm-01.inside.domusdigitalis.dev - DNS NXDOMAIN
Encountered NXDOMAIN when looking up TXT record: _acme-challenge.kvm-01.inside.domusdigitalis.dev
DNS TXT record _acme-challenge.kvm-01.inside.domusdigitalis.dev does not exist in Cloudflare zone, OR DNS propagation timing is too short.
# Edit renewal config
sudo vim /etc/letsencrypt/renewal/kvm-01.inside.domusdigitalis.dev.conf
# Add or increase:
dns_cloudflare_propagation_seconds = 60
# Check API token permissions
# Token needs: Zone:DNS:Edit for domusdigitalis.dev
# List zones accessible by token
curl -X GET "https://api.cloudflare.com/client/v4/zones" \
-H "Authorization: Bearer $(cat /home/ansible/.secrets/cloudflare.ini | grep api_token | cut -d= -f2 | tr -d ' ')" \
-H "Content-Type: application/json"
# Consider using wildcard for *.inside.domusdigitalis.dev
# This reduces individual cert management
certbot certonly \
--dns-cloudflare \
--dns-cloudflare-credentials /home/ansible/.secrets/cloudflare.ini \
-d "*.inside.domusdigitalis.dev" \
-d "inside.domusdigitalis.dev"
Deploy Hook Failures
/etc/letsencrypt/renewal-hooks/deploy/deploy-certs.sh
| Issue | Symptom | Fix |
|---|---|---|
ISE API unreachable |
|
Check ISE is up, verify IP in secrets |
age decryption failed |
|
Verify age key in |
iPSK SSH failed |
|
Check iPSK VM is running, SSH key authorized |
Missing environment variable |
|
Secrets not loaded, check age file path |
# Run deploy hook manually with debug
sudo bash -x /etc/letsencrypt/renewal-hooks/deploy/deploy-certs.sh
# Check what certificates triggered the hook
echo $RENEWED_LINEAGE
echo $RENEWED_DOMAINS
Systemd Timer Status
# Check timer status
systemctl status certbot-renew.timer
# Check recent runs
journalctl -u certbot-renew.service -n 50
# Force a renewal check
sudo systemctl start certbot-renew.service
certbot-renew.timer - Run certbot twice daily
Loaded: loaded (/lib/systemd/system/certbot-renew.timer; enabled)
Active: active (waiting)
Trigger: [next trigger time]
Full Renewal Test
# Dry-run all certificates
sudo certbot renew --dry-run
# Expected: All simulated renewals succeeded
# If failures, check each cert individually
Emergency Procedures
Certificate About to Expire
# Force renewal immediately
sudo certbot renew --force-renewal
# If renewal fails, issue new cert
sudo certbot certonly \
--dns-cloudflare \
--dns-cloudflare-credentials /home/ansible/.secrets/cloudflare.ini \
-d [domain]
Deploy Hook Not Running
# Manually deploy after renewal
export RENEWED_LINEAGE="/etc/letsencrypt/live/[cert-name]"
export RENEWED_DOMAINS="[domain]"
sudo -E /etc/letsencrypt/renewal-hooks/deploy/deploy-certs.sh
ISE Guest Portal Cert Expired
# 1. Renew cert
sudo certbot renew --cert-name guest.domusdigitalis.dev --force-renewal
# 2. Manual deployment to ISE if hook fails
# Use netapi or direct API call
source <(dsec source d000 dev/network)
curl -sk -X POST "https://$<ise-pan-ip>/api/v1/certs/system-certificate/import" \
-H "Authorization: Basic $<ise-api-token>" \
-H "Content-Type: application/json" \
-d @- << EOF
{
"admin": false,
"eap": false,
"portal": true,
"portalGroupTag": "Default Portal Certificate Group",
"data": "$(base64 -w0 /etc/letsencrypt/live/guest.domusdigitalis.dev/fullchain.pem)",
"privateKeyData": "$(base64 -w0 /etc/letsencrypt/live/guest.domusdigitalis.dev/privkey.pem)"
}
EOF
Long-Term Solutions
Priority 1: Fix Current Failures
-
Fix guest.domusdigitalis.dev cloudflare path
-
Fix kvm-01.inside.domusdigitalis.dev DNS propagation/NXDOMAIN
-
Verify all certs renew with
--dry-run