WRKLOG-2026-03-11

Summary

CLI tooling day + Vocera triage. Expanded netapi with git forge integrations (GitHub, GitLab, Gitea) and enhanced Monad pipeline management. All commands support -f json for jq piping.

Work priority: ~10 Vocera phones failing 802.1X due to missing EAP-TLS supplicant configuration.

Completed Today (2026-03-10)

Task Details Status

Monad CLI Enhancement

Added 7 new commands: input-create, quick-pipeline, graph, health, clone, watch, inputs --type filter

COMPLETE

GitHub CLI

26 commands: repos, prs, issues, workflows, gists, starred, orgs, etc.

COMPLETE

GitLab CLI

23 commands: projects, mrs, pipelines, groups, etc.

COMPLETE

Gitea CLI

16 commands: repos, prs, issues, mirror, create, delete, etc.

COMPLETE

Command Composition Patterns

Created command-composition.adoc - bash patterns mapping to Python/Go concepts

COMPLETE

netapi CLI Summary

Module Commands Env Vars

netapi monad

33 commands (pipelines, inputs, outputs, transforms, health, graph, watch, etc.)

MONAD_API_KEY, MONAD_ORG_ID

netapi github

26 commands (repos, prs, issues, workflows, gists, starred, etc.)

GITHUB_TOKEN or GH_TOKEN

netapi gitlab

23 commands (projects, mrs, pipelines, groups, etc.)

GITLAB_TOKEN, GITLAB_URL

netapi gitea

16 commands (repos, prs, issues, mirror, create, delete, etc.)

GITEA_TOKEN, GITEA_URL

Previous Day (2026-03-09)

Task Details Status

BIND HA

bind-02 deployed on kvm-02, AXFR zone sync working

COMPLETE

Vault HA Cluster

vault-02/03 deployed, TLS certs issued, joined Raft cluster

COMPLETE

evanusmodestus sudo

Password reset via virsh console (ansible user)

COMPLETE

Vault HA Cluster Status

Node        Address                                    State       Voter
vault-01    vault-01.inside.domusdigitalis.dev:8201    leader      true
vault-02    vault-02.inside.domusdigitalis.dev:8201    follower    true
vault-03    vault-03.inside.domusdigitalis.dev:8201    follower    true

Today’s Priorities (2026-03-11)

Priority Task Status Notes

P0

Vocera EAP-TLS Supplicant Fix

[ ] IN PROGRESS

~10 phones failing 802.1X, missing supplicant config

P0

Monad Pipeline Evaluation

[ ] PENDING

Test pipeline creation, input sources, transforms

P1

k3s NAT verification

[ ] PENDING

NAT rule 170 applied, test pod internet access

P1

Wazuh indexer recovery

[ ] PENDING

Restart pod after NAT confirmed working

Vocera EAP-TLS Commands (File-Based MAC Lookup)

Create MAC file:

# One MAC per line, uppercase with colons
cat > /tmp/vocera-macs.txt << 'EOF'
00:09:EF:AA:BB:CC
00:09:EF:DD:EE:FF
18:B4:30:11:22:33
EOF

Query ISE for each MAC:

# Get endpoint details for each MAC
while read -r mac; do
    echo "=== $mac ==="
    netapi ise endpoint get "$mac" -f json | jq '{mac: .mac, profile: .profileId, group: .groupId, staticGroupAssignment: .staticGroupAssignment}'
done < /tmp/vocera-macs.txt

Check authentication history (DataConnect):

# Build SQL IN clause from file
MACS=$(awk '{printf "\x27%s\x27,", $1}' /tmp/vocera-macs.txt | sed 's/,$//')

netapi ise dc query "SELECT
    acs_timestamp,
    calling_station_id AS mac,
    passed,
    failure_reason,
    selected_azn_profiles
FROM mnt.radius_auth_48_live
WHERE calling_station_id IN ($MACS)
ORDER BY acs_timestamp DESC
LIMIT 100" -f json | jq '.'

Find failed auths only:

while read -r mac; do
    echo "=== $mac failures ==="
    netapi ise dc query "SELECT acs_timestamp, failure_reason
        FROM mnt.radius_auth_48_live
        WHERE calling_station_id = '$mac' AND passed = 0
        ORDER BY acs_timestamp DESC LIMIT 5" -f json | jq -r '.[] | "\(.acs_timestamp): \(.failure_reason)"'
done < /tmp/vocera-macs.txt

Bulk CoA reauth after fix:

# Reauth all devices after supplicant is configured
while read -r mac; do
    echo "Reauth: $mac"
    netapi ise mnt coa reauth --mac "$mac"
    sleep 1  # Rate limit
done < /tmp/vocera-macs.txt

Carried Over / Previous Priorities

Priority Task Status Notes

P0

Vault HA failover test (Phase 6)

[x] DONE

Verified: vault-02 became leader, PKI worked, vault-01 rejoined

P0

vault-ssh-sign HA update

[x] DONE

Script now health-checks all 3 nodes, auto-failover

P0

Fix vault-backup.service

[x] DONE

SELinux policy module installed - rsync_t → ssh_exec_t

P2

PacketFence VM exploration

[ ] PENDING

Deploy packetfence-01 on kvm-02, evaluate open-source NAC

HA Deployment Queue Status

Priority System Status Next Action

P1

BIND

COMPLETE

bind-01 + bind-02 (AXFR)

P2

Vault

COMPLETE

vault-01/02/03 (Raft)

P3

Keycloak

NEXT

Rebuild from scratch (corrupted)

P4

FreeIPA

PLANNED

ipa-01 + ipa-02 (IPA Replication)

P5

AD DC

PLANNED

home-dc01 + home-dc02 (AD Replication)

P6

iPSK

PLANNED

ipsk-mgr-01 + ipsk-mgr-02 (MySQL Replication)

P7

ISE

DEFERRED

ise-01 reconfigure after ise-02 stable

Current Single Points of Failure

System Impact if Down

ISE (ise-02)

All 802.1X stops - wired + wireless auth fails

Keycloak

SAML/OIDC SSO broken (ISE admin, Grafana, etc.)

FreeIPA (ipa-01)

Linux authentication, sudo rules, HBAC

AD DC (home-dc01)

Windows auth, Kerberos, GPO

iPSK Manager

Self-service PSK portal unavailable

Evaluation VMs

PacketFence Evaluation

Purpose: Educational deployment. ISE remains production NAC. Understand FreeRADIUS internals.

VM Specs:

Resource Value

Name

packetfence-01

Hypervisor

kvm-02

vCPU

4

RAM

8GB

Disk

100GB

IP

TBD (10.50.1.x)

OS

Rocky Linux 9 or Debian 12

PacketFence Components:

  • FreeRADIUS (802.1X, MAB)

  • MariaDB (backend)

  • Captive portal

  • Device profiling

  • VLAN assignment

  • Guest management

Evaluation Goals:

  1. Deploy standalone instance

  2. Test 802.1X with a single endpoint

  3. Compare admin experience vs ISE

  4. Document findings for future reference

Session Log

Session 1: Vault HA Failover Test

Objective: Verify Raft leader election works correctly.

Pre-flight check:

export VAULT_ADDR="https://vault-01.inside.domusdigitalis.dev:8200"
vault operator raft list-peers

Failover test:

# Stop the leader
ssh vault-01 "sudo systemctl stop vault"

# Wait for election
sleep 10

# Check new leader (from vault-02)
export VAULT_ADDR="https://vault-02.inside.domusdigitalis.dev:8200"
vault operator raft list-peers

Verify operations work:

vault list pki_int/certs | head -3

Restart vault-01:

ssh vault-01 "sudo systemctl start vault"

# Unseal (3 keys required)
ssh -t vault-01 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault operator unseal"
ssh -t vault-01 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault operator unseal"
ssh -t vault-01 "VAULT_ADDR=https://127.0.0.1:8200 VAULT_SKIP_VERIFY=1 vault operator unseal"

# Verify rejoin
vault operator raft list-peers

Result: [x] DONE - vault-02 became leader, PKI cert issuance verified, vault-01 rejoined as follower.

Critical Fix Discovered:

# /var/log/vault MUST exist on all nodes before failover works
ssh vault-02 "sudo mkdir -p /var/log/vault && sudo chown vault:vault /var/log/vault"
ssh vault-03 "sudo mkdir -p /var/log/vault && sudo chown vault:vault /var/log/vault"

Audit log configuration replicates via Raft, but filesystem doesn’t. Without the directory, failover fails with "mkdir /var/log/vault: permission denied".


Session 2: vault-backup.service SELinux Fix

Objective: Fix failed systemd unit on vault-01.

Root Cause: SELinux rsync_t domain cannot execute ssh_exec_t.

Error:

rsync: [sender] Failed to exec ssh: Permission denied (13)
rsync error: error in IPC code (code 14)

Key insight: Manual sudo rsync worked because it runs in unconfined_t. The systemd service runs rsync in the confined rsync_t domain.

Fix (permissive domain approach):

# Capture ALL denials at once
sudo semanage permissive -a rsync_t
sudo systemctl start vault-backup.service

# Generate comprehensive policy
sudo ausearch -m avc --start today | grep rsync | audit2allow -M vault-backup

# Install and re-enable enforcing
sudo semodule -i vault-backup.pp
sudo semanage permissive -d rsync_t

# Test
sudo systemctl start vault-backup.service && systemctl status vault-backup.service

Result: [x] DONE - Service succeeded, timer scheduled for 02:29 UTC.

Documentation created:


Session 3: vault-ssh-sign HA Update

Objective: Make vault-ssh-sign script HA-aware so it doesn’t fail when vault-01 is down.

Problem: Non-deterministic leader election means any node could be leader. Clients shouldn’t care who’s leader.

Solution: Script now health-checks all 3 nodes and uses first healthy one.

Key patterns:

# Health check endpoint
# 200 = active leader, 429 = standby (both can serve)
curl -sk --max-time 3 -o /dev/null -w "%{http_code}" \
    "https://vault-01.inside.domusdigitalis.dev:8200/v1/sys/health"

Nodes array (preference order):

VAULT_NODES=(
    "https://vault-01.inside.domusdigitalis.dev:8200"
    "https://vault-02.inside.domusdigitalis.dev:8200"
    "https://vault-03.inside.domusdigitalis.dev:8200"
)

Committed: dotfiles-optimus - feat(vault): HA-aware vault-ssh-sign with automatic failover

Test commands:

# Run the updated script
vault-ssh-sign

# Verify SSH cert works
vault-ssh-test

Result: [x] DONE - Script updated and pushed to GitHub.


Session 4: netapi Git Forge Integration

Objective: Add GitHub, GitLab, Gitea CLI commands to netapi for jq-friendly API access.

Files created:

File Purpose

netapi/vendors/github/client.py

GitHub REST API client (Bearer auth)

netapi/vendors/gitlab/client.py

GitLab REST API client (PRIVATE-TOKEN auth)

netapi/cli/github.py

26 CLI commands

netapi/cli/gitlab.py

23 CLI commands

netapi/cli/gitea.py

16 CLI commands (existing vendor, new CLI)

jq examples:

# List all repo names
netapi github repos -f json | jq -r '.[].full_name'

# Get open MR titles from GitLab
netapi gitlab mrs mygroup/myproject -f json | jq -r '.[].title'

# Search Gitea and get clone URLs
netapi gitea search netapi -f json | jq -r '.[].ssh_url'

# GitHub PR files with stats
netapi github pr-files owner/repo 42 -f json | jq '.[] | "\(.filename): +\(.additions) -\(.deletions)"'

Commits:

  • feat(monad): Add pipeline visualization, health check, and quick-create commands

  • feat(forge): Add GitHub, GitLab, Gitea CLI commands

Result: [x] DONE - 65 new commands across 3 git forges, all with -f json support.


Session 5: Command Composition Patterns

Objective: Document bash patterns that map to Python/Go concepts for learning.

File created: domus-captures/docs/modules/ROOT/examples/codex/bash/command-composition.adoc


Session 6: netapi ISE TAC Case Prep Expansion

Objective: Massive expansion of ISE TAC diagnostic patterns for work use.

File updated: domus-captures/docs/modules/ROOT/examples/commands/netapi/ise-tac-case-prep.adoc

Changes: 881 → 2073 lines (+1192 lines)

New sections added:

Section Contents

Certificate Diagnostics

Cert failures by issuer, CN, time, chain validation errors

Time-Based Analysis

Peak hour analysis, hourly trends, business hours vs after-hours

VLAN Analysis

VLAN distribution, misassignment, change patterns

Security Analysis

Brute force detection, MAC spoofing, rogue devices, unauthorized access

EAP Method Analysis

EAP-TLS vs PEAP vs TEAP breakdown, method failures, protocol transitions

NAD Analysis

Per-switch/WLC auth stats, failure hotspots, port-level breakdown

User Analysis

User history, multi-device users, roaming patterns

Session Analysis

Session duration, concurrent sessions, session lifecycle

Policy Analysis

Policy set effectiveness, hit counts, rule ordering optimization

CoA Analysis

CoA success/failure rates, reauth patterns, disconnect analysis

Profiler Deep Dive

Profiling accuracy, endpoint DB size, stale endpoint cleanup

Extended Bundles

20-step TAC bundle, 10-step security audit

Commit: 927c5f3 - pushed to origin

Key discovery: -f json only works for ERS commands (get-*) and api-call, NOT for MnT or DataConnect commands. MnT/DC output is human-readable only.


Session 7: modestus-razer EAP-TLS WiFi Troubleshooting

Objective: Fix intermittent EAP-TLS authentication failures on workstation.

Symptom:

ISE Error: 5411 - "Supplicant stopped responding to ISE"
Failure Reason: 12935 - "Supplicant stopped responding to ISE during EAP-TLS certificate exchange"
Step 12935 latency: 120001 ms (2 minute timeout)

wpa_supplicant logs showed:

EAP-TLS: Certificate chain validated ✓ (ROOT → ISSUING → ise-02)
Selected EAP-TLS
CTRL-EVENT-DISCONNECTED reason=3 locally_generated=1

NetworkManager error:

Secrets were required, but not provided

Diagnosis steps:

  1. Cert files exist ✓

    ls -la /etc/ssl/certs/modestus-razer-eaptls.pem /etc/ssl/private/modestus-razer-eaptls.key
    # Both present, root-owned, 0644/0600
  2. Key NOT encrypted ✓

    sudo head -3 /etc/ssl/private/modestus-razer-eaptls.key
    # -----BEGIN RSA PRIVATE KEY-----  (not ENCRYPTED)
  3. Cert/key modulus MATCH ✓

    openssl x509 -noout -modulus -in /etc/ssl/certs/modestus-razer-eaptls.pem | openssl md5
    # 9d83a2e6b0f21ac6faa5529b99285a5c
    
    sudo openssl rsa -noout -modulus -in /etc/ssl/private/modestus-razer-eaptls.key | openssl md5
    # 9d83a2e6b0f21ac6faa5529b99285a5c
  4. Certificate chain analysis:

    openssl x509 -noout -subject -issuer -in /etc/ssl/certs/modestus-razer-eaptls.pem
    # subject=O=Domus-Infrastructure, OU=Domus-Admins, CN=modestus-razer.inside.domusdigitalis.dev
    # issuer=CN=DOMUS-ISSUING-CA
    
    # How many certs in file?
    openssl crl2pkcs7 -nocrl -certfile /etc/ssl/certs/modestus-razer-eaptls.pem | openssl pkcs7 -print_certs -noout | grep -c "subject="
    # 1  ← ONLY the leaf cert!

ROOT CAUSE IDENTIFIED:

Client certificate file contains only the leaf cert (count=1). ISE expects the client to send the full chain during EAP-TLS handshake:

  • client certDOMUS-ISSUING-CADOMUS-ROOT-CA

If ISE doesn’t have the intermediate cached or the client doesn’t send it, ISE waits for the full chain, times out after 120 seconds, and logs error 12935.

Fix (pending):

# Create chain file
cat /etc/ssl/certs/modestus-razer-eaptls.pem \
    /etc/ssl/certs/DOMUS-ISSUING-CA.pem \
    > /tmp/modestus-razer-chain.pem

# Verify chain
openssl crl2pkcs7 -nocrl -certfile /tmp/modestus-razer-chain.pem | \
    openssl pkcs7 -print_certs -noout | grep -c "subject="
# Should be 2 (leaf + intermediate)

# Update NM connection
sudo cp /tmp/modestus-razer-chain.pem /etc/ssl/certs/modestus-razer-eaptls-chain.pem
sudo nmcli con modify "Domus-WiFi-EAP-TLS" \
    802-1x.client-cert /etc/ssl/certs/modestus-razer-eaptls-chain.pem

# Reconnect
nmcli con down "Domus-WiFi-EAP-TLS" && nmcli con up "Domus-WiFi-EAP-TLS"

Result: [ ] PENDING - fix not yet applied


Key Learnings from 2026-03-11

EAP-TLS Chain Lesson

Issue Solution

ISE error 5411 "Supplicant stopped responding"

Client cert file missing intermediate CA - ISE waits 120s for full chain

"Secrets were required, but not provided"

Red herring - NM error when wpa_supplicant can’t complete handshake

Leaf-only cert + 2-tier PKI

Always include intermediate in client cert file for EAP-TLS

Key Learnings from 2026-03-10

Vault HA Lessons

Issue Solution

PKI role doesn’t allow short hostnames

Add hostnames explicitly to allowed_domains list

"chown: invalid user: vault:vault"

Vault not installed - cloud-init may not run on copied images

Glob expansion over SSH fails

Use explicit file paths, not * wildcards

"failed to get raft challenge"

Add leader_ca_cert_file to retry_join blocks

"file descriptor 0 is not a terminal"

Use ssh -t for interactive prompts (unseal)

Failover fails: "mkdir /var/log/vault: permission denied"

Create audit directory on ALL nodes BEFORE enabling audit logging

Non-deterministic leader election

Don’t care who’s leader - make clients HA-aware with health checks

SELinux denial whack-a-mole

Use semanage permissive -a <domain> to capture ALL denials at once

Manual vs systemd SELinux

Manual sudo runs unconfined_t, systemd runs confined domains

CLI Development Lessons

Pattern Learning

Typer + Rich

Annotated[type, typer.Option()] for clean CLI args, Rich tables for human output

JSON output flag

Always add -f json option - enables jq piping for automation

Client validation

Validate API keys early in get_client() - clean error messages

Git forge auth differences

GitHub: Bearer, GitLab: PRIVATE-TOKEN, Gitea: token

xargs with shell functions

Functions aren’t in PATH - use while read instead, or bash -c 'source …​ && func'

Command composition

$(…​) = function returns, pipes = streams, while read = iteration

virsh Console Emergency Access

When SSH fails and sudo is broken:

# From kvm-01/kvm-02
sudo virsh console vault-01

# Login as ansible (has NOPASSWD sudo)
# Reset user password
sudo passwd evanusmodestus

# Exit console: Ctrl+]

Tomorrow (2026-03-12)

P0 - Must Complete

  • Fix modestus-razer EAP-TLS - Add intermediate CA to client cert chain

  • Vocera EAP-TLS - ~10 phones failing 802.1X (work)

Carried Over (Not Complete Today)

  • k3s NAT verification - NAT rule 170 applied, test pod internet access

  • Wazuh indexer recovery - Restart pod after NAT confirmed working

  • Monad Pipeline Evaluation - Test pipeline creation, input sources, transforms

  • PacketFence VM exploration - Deploy packetfence-01 on kvm-02

HA Queue

  • Keycloak rebuild (P3 in HA queue)

  • FreeIPA ipa-02 replica (P4)

Other

  • Documentation catchup

  • Test netapi git forge commands with real tokens

Runbook References

Runbook URL

Vault HA Deployment

Vault HA Deployment (infra-ops runbook)

Vault Backup

Vault Backup to NAS (infra-ops runbook)

KVM Operations

KVM Operations (infra-ops runbook)