WRKLOG-2026-02-23

Summary

Deployed Wazuh SIEM 4.14.3 on k3s cluster. Completely rewrote broken runbook. Issued Vault PKI certificate for proper TLS. Full production deployment with DNS, firewall, systemd persistence, and gopass credential storage.

Wazuh Deployment Architecture

Work Priorities (Behind)

Behind on work deliverables. These are P0 for Monday:

Priority Project Action Required Status

P0

Linux Research (Xianming Ding)

Linux AD Authentication deployment for research workstations

Behind

P0

iPSK Manager

DB replication troubleshooting, manager functionality

Behind

P0

MSCHAPv2 Migration

Migrate legacy PEAP-MSCHAPv2 to EAP-TLS

Behind

P1

ISE 3.4 Migration

Migration timeline from 3.2p9

Pending

P1

Switch Upgrades

Maintenance window coordination

Pending

Personal Priorities

Priority Task Status

P0

Prometheus + Grafana on k3s

Done ✓

P0

Vault PKI TLS for k3s services

Done ✓

P0

kvm-02 hardware (64GB RAM)

In Progress

P1

k3s HA cluster (3 masters)

Blocked on kvm-02

P1

Vault HA (3-node Raft)

Blocked on kvm-02

P2

Wazuh agents deployment

Pending

P2

Syslog sources (pfSense, ISE, switches)

Pending

Completed Today

Task Status Notes

Wazuh SIEM 4.14.3 deployment

Done

All 4 pods running (dashboard, indexer, manager-master, manager-worker)

k3s-wazuh.adoc runbook rewrite

Done

Complete rewrite - original was broken (wrong version, wrong cert scripts, wrong storage)

VM resource upgrade

Done

k3s-master-01: 2→4 CPU cores, 4→8GB RAM

NFS provisioner

Done

Dynamic PVC creation for StatefulSets

Vault PKI certificate

Done

CN=wazuh.inside.domusdigitalis.dev, expires 2027-02-23

DNS record

Done

wazuh.inside.domusdigitalis.dev → 10.50.1.120

Systemd service

Done

wazuh-dashboard-pf.service for persistent port-forward

Firewall configuration

Done

514/udp, 1514, 1515, 5601, 8443, 9200, 55000

gopass credential storage

Done

v3/domains/d000/k3s/wazuh

Prometheus + Grafana deployment

Done

kube-prometheus-stack Helm chart, all pods running

Grafana Vault PKI TLS

Done

grafana-tls secret + IngressRoute

Prometheus Vault PKI TLS

Done

prometheus-tls secret + IngressRoute

AlertManager Vault PKI TLS

Done

alertmanager-tls secret + IngressRoute

Command favorites (domus-captures)

Done

14 pages: awk, jq, curl, dig, sed, grep, find, heredocs, xargs, kubectl, openssl, security, one-liners

k3s-prometheus-grafana.adoc Phase 7 rewrite

Done

Let’s Encrypt → Vault PKI individual certs, security decision table

MetalLB LoadBalancer

Done

L2 mode, IP pool 10.50.1.130-140, Traefik VIP 10.50.1.130

k3s-metallb.adoc runbook

Done

Helm install, IP pool config, DNS update, troubleshooting

Carried Over to Tomorrow

Task Status Notes

Deploy Prometheus + Grafana

Pending

Runbook ready, deprioritized for Wazuh

kvm-02 hardware upgrade

In Progress

64GB RAM installation

k3s HA cluster

Pending

Requires kvm-02 VMs

Session Log

Session 1: Wazuh SIEM Deployment

07:30 - 08:30 UTC

Objective: Deploy Wazuh SIEM on k3s

Issues Encountered:

  1. Wazuh 5.0 doesn’t exist - Docker Hub only has 4.14.x

    • Fix: git clone -b v4.14.3 --depth=1

  2. packages.wazuh.com returns 403 - Can’t download cert tool

    • Fix: Use built-in scripts in repo:

      bash wazuh/certs/indexer_cluster/generate_certs.sh
      bash wazuh/certs/dashboard_http/generate_certs.sh
  3. Pods stuck Pending - Insufficient CPU

    • Wazuh needs ~1800m CPU total

    • Fix: Upgrade VM from 2 to 4 cores

      ssh kvm-01 "sudo virsh setvcpus k3s-master-01 4 --config --maximum"
      ssh kvm-01 "sudo virsh setvcpus k3s-master-01 4 --config"
      ssh kvm-01 "sudo virsh setmaxmem k3s-master-01 8G --config"
      ssh kvm-01 "sudo virsh setmem k3s-master-01 8G --config"
  4. StatefulSets need dynamic storage - Manual PVs don’t work

    • Fix: Install NFS subdir external provisioner

      helm install nfs-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
        --namespace kube-system \
        --set nfs.server=10.50.1.70 \
        --set nfs.path=/volume1/k3s/wazuh \
        --set storageClass.name=nfs-client
  5. Port-forward dies on SSH disconnect

    • Fix: Systemd service with KUBECONFIG env var

  6. Default credentials wrong in docs

    • Actual: admin / SecretPassword (from indexer-cred secret)

    • Stored in gopass: v3/domains/d000/k3s/wazuh

Session 2: Vault PKI Certificate

08:30 - 08:45 UTC

Objective: Replace self-signed cert with Vault-issued cert

Commands executed:

# Issue cert from Vault
vault write -format=json pki_int/issue/domus-client \
  common_name="wazuh.inside.domusdigitalis.dev" \
  ttl="8760h" > /tmp/wazuh-cert.json

# Extract components
jq -r '.data.certificate' /tmp/wazuh-cert.json > /tmp/wazuh.crt
jq -r '.data.private_key' /tmp/wazuh-cert.json > /tmp/wazuh.key
jq -r '.data.ca_chain[]' /tmp/wazuh-cert.json > /tmp/wazuh-ca.crt

# Copy to k3s node
scp /tmp/wazuh.crt /tmp/wazuh.key /tmp/wazuh-ca.crt k3s-master-01:/tmp/

# Create secret (on k3s-master-01)
kubectl -n wazuh create secret generic dashboard-certs-vault \
  --from-file=cert.pem=/tmp/wazuh.crt \
  --from-file=key.pem=/tmp/wazuh.key \
  --from-file=root-ca.pem=/tmp/wazuh-ca.crt \
  --dry-run=client -o yaml | kubectl apply -f -

# Patch deployment (volume index 1, not 0!)
kubectl -n wazuh patch deployment wazuh-dashboard --type=json -p='[
  {"op": "replace", "path": "/spec/template/spec/volumes/1/secret/secretName", "value": "dashboard-certs-vault"}
]'

# Restart
kubectl -n wazuh rollout restart deployment/wazuh-dashboard

Result: Browser shows secure connection with DOMUS-ISSUING-CA certificate.

Key Learnings

Wazuh Kubernetes Deployment

  • Version: Always use released versions (4.14.3), not unreleased (5.0)

  • Certificates: Built-in scripts work; packages.wazuh.com may be blocked

  • Storage: StatefulSets require dynamic provisioner, not manual PVs

  • Resources: Minimum 4 CPU cores, 8GB RAM for single-node deployment

  • Kustomize: Use envs/local-env/ overlay for single-node

Kubernetes Certificate Replacement

  • Dashboard uses volume index 1 (dashboard-certs), not 0 (config)

  • Always verify volume structure before patching:

    kubectl get deployment <name> -o jsonpath='{.spec.template.spec.volumes}' | jq

Port-Forward Persistence

  • SSH background (&) doesn’t survive disconnect

  • Systemd service needs KUBECONFIG environment variable

  • Service file: /etc/systemd/system/wazuh-dashboard-pf.service

Infrastructure Status

k3s Wazuh Pods

NAME                               READY   STATUS    AGE
wazuh-dashboard-566bfbdc97-xxx     1/1     Running   ~30m
wazuh-indexer-0                    1/1     Running   ~30m
wazuh-manager-master-0             1/1     Running   ~30m
wazuh-manager-worker-0             1/1     Running   ~30m

Access

Communication Lesson: Articulating Technical Requirements

Reflection on how to clearly communicate visual/technical requirements to engineers, TAC, or AI assistants.

The Problem

Initial request: "these labels are hard to read. very faint"

This is vague. The engineer tried multiple fixes:

  1. Increased font size (not the issue)

  2. Changed to yellow text on blue (worse)

  3. Changed to light blue background (wrong direction)

  4. Changed to ice blue text (more opaque)

The Clear Request

After reflection, the correct request was:

  1. Fix font labels - Make connection labels readable with proper contrast (dark text colors, appropriate font sizes, bold)

  2. Flatten the connection angles - Change the layout so connections from left-side sources to the right-side cluster are more horizontal, reducing the vertical spread. Currently the diagram is too tall, forcing readers to scroll up and down. I want a wider, shorter diagram with gentler connection angles.

Key Principles for Technical Requests

Principle Bad Example Good Example

Identify the symptom AND cause

"It’s hard to read"

"The white text (#ffffff) has poor contrast against the light blue container background (#e3f2fd)"

Specify what should NOT change

(omitted)

"The fill color of the shape is fine - just need better text contrast"

Describe the desired outcome

"Make it better"

"I want a wider, shorter diagram with gentler connection angles"

Use specific technical terms

"The lines are too steep"

"Flatten the connection angles to be more horizontal"

Explain the user impact

"I don’t like it"

"Forces readers to scroll up and down to see all connections"

Template for Visual/Diagram Requests

CURRENT STATE: [What you observe]
PROBLEM: [Why it's an issue - user impact]
KEEP: [What should NOT change]
CHANGE: [Specific modifications needed]
DESIRED OUTCOME: [What success looks like]

Example Applied

CURRENT STATE: Connection labels render with #ffffff text
PROBLEM: Poor contrast against #e3f2fd container - hard to read
KEEP: The dark blue hexagon fill (#0d47a1)
CHANGE: Text color to dark navy (#1a237e) for contrast
DESIRED OUTCOME: Labels clearly visible without straining

This applies to TAC cases, peer reviews, and any technical collaboration.

Session 3: DNS PTR Troubleshooting

18:00 - 18:30 UTC

Objective: Fix PTR reverse lookup returning empty despite zone validation passing

Symptom:

sudo named-checkzone 1.50.10.in-addr.arpa /var/named/10.50.1.rev
zone 1.50.10.in-addr.arpa/IN: loaded serial 2026022301
OK

dig +short -x 10.50.1.120 @localhost
# (empty result - NXDOMAIN)

Root Cause: Leading whitespace in zone file. Zone files are whitespace-sensitive:

  • Record at column 1 → uses specified owner name

  • Record with leading whitespace → continues previous owner (WRONG!)

Diagnosis with awk (brackets reveal whitespace):

sudo awk '/120|121|122/ {print NR": ["$0"]"}' /var/named/10.50.1.rev

Fix:

sudo sed -i '/IN[[:space:]]*PTR/s/^[[:space:]]*//' /var/named/10.50.1.rev
sudo rndc reload 1.50.10.in-addr.arpa

Documented in: dns-operations.adoc - Troubleshooting section + Chronicle

Session 4: Wazuh OpenSearch Password Change

18:30 - 19:45 UTC

Objective: Change Wazuh admin password from default to gopass-generated secure password

Challenge: OpenSearch admin user is reserved: true - cannot be changed via API

Discovery Commands (now documented):

# View all credential secrets decoded
for secret in dashboard-cred indexer-cred wazuh-api-cred wazuh-authd-pass; do
  echo "=== $secret ==="
  kubectl -n wazuh get secret $secret -o json | jq -r '.data | to_entries[] | "\(.key): \(.value | @base64d)"'
done

# Check what secrets a pod uses
kubectl -n wazuh get pod -l app=wazuh-dashboard -o yaml | awk '/secretKeyRef/,/name:/'

# Find ConfigMap name
kubectl get pod wazuh-indexer-0 -o json | jq -r '.spec.volumes[] | select(.configMap) | .configMap.name'

Full Procedure (6 steps):

  1. Generate bcrypt hash on workstation:

    WAZUH_PW=$(gopass show -o v3/domains/d000/k3s/wazuh)
    python3 -c "import bcrypt; print(bcrypt.hashpw(b'<PASSWORD>', bcrypt.gensalt(rounds=12)).decode())"
  2. Export ConfigMap:

    kubectl -n wazuh get configmap indexer-conf-45bbc2fk49 -o yaml > /tmp/internal-users-cm.yaml
  3. Update hash with sed:

    HASH='$2b$12$<YOUR_HASH>'
    sed -i "s|<OLD_HASH>|$HASH|" /tmp/internal-users-cm.yaml
    grep -A3 "admin:" /tmp/internal-users-cm.yaml
  4. Apply and restart:

    kubectl apply -f /tmp/internal-users-cm.yaml
    kubectl -n wazuh rollout restart statefulset/wazuh-indexer
    kubectl -n wazuh rollout status statefulset/wazuh-indexer
  5. Reload OpenSearch security (JAVA_HOME required!):

    kubectl -n wazuh exec wazuh-indexer-0 -- env OPENSEARCH_JAVA_HOME=/usr/share/wazuh-indexer/jdk \
      /usr/share/wazuh-indexer/plugins/opensearch-security/tools/securityadmin.sh \
      -cd /usr/share/wazuh-indexer/config/opensearch-security/ \
      -icl -nhnv \
      -cacert /usr/share/wazuh-indexer/config/certs/root-ca.pem \
      -cert /usr/share/wazuh-indexer/config/certs/admin.pem \
      -key /usr/share/wazuh-indexer/config/certs/admin-key.pem
  6. Verify:

    curl -k -u admin:<NEW_PASSWORD> https://localhost:9200/_cluster/health

Key Learnings:

  • indexer-cred k8s secret ≠ OpenSearch internal users (separate auth)

  • dashboard-cred is service account (kibanaserver) - don’t change!

  • ConfigMaps are read-only in pods - must edit source and reapply

  • securityadmin.sh needs OPENSEARCH_JAVA_HOME set

  • Cert paths: /usr/share/wazuh-indexer/config/certs/ (not /certs/)

Documented in: k3s-wazuh.adoc - 4 new commits with full procedure

Session 5: TLS Certificate Validation Patterns

Throughout day

awk patterns for certificate validation (now documented):

# Quick TLS check
curl -vI https://wazuh.inside.domusdigitalis.dev:8443 2>&1 | grep -E "subject:|issuer:|expire|SSL|CN"

# Extract key details with awk
curl -vI --silent https://wazuh.inside.domusdigitalis.dev:8443 2>&1 | awk '/subject:|issuer:|expire date|SSL connection/'

# Calculate days until expiry
EXPIRE=$(curl -vI --silent https://wazuh.inside.domusdigitalis.dev:8443 2>&1 | awk '/expire date:/ {print $4, $5, $6, $7}')
echo "Expires: $EXPIRE ($(( ($(date -d "$EXPIRE" +%s) - $(date +%s)) / 86400 )) days)"

# Validate issuer is Vault PKI
curl -vI --silent https://wazuh.inside.domusdigitalis.dev:8443 2>&1 | awk '/issuer:/ {print ($0 ~ /DOMUS-ISSUING-CA/) ? "✓ Vault PKI" : "✗ Unknown CA"}'

Verified on ISE:

curl -vI https://ise-01.inside.domusdigitalis.dev:8443 2>&1 | awk '/subject:|issuer:|expire/'
# subject: CN=ise-01.inside.domusdigitalis.dev
# expire date: Feb 11 08:49:10 2027 GMT
# issuer: CN=DOMUS-ISSUING-CA

Session 6: Vault PKI TLS for k3s Monitoring

Afternoon UTC

Objective: Issue individual Vault PKI certificates for Grafana, Prometheus, AlertManager

Security Decision: Individual certs > Wildcard certs

  • Blast radius limited to single service if key compromised

  • Follows least-privilege principle

  • Better audit trail

  • Wildcard exposes ALL subdomains on single key compromise

Workflow (from workstation):

# Issue cert
vault write -format=json pki_int/issue/domus-client \
  common_name="<service>.inside.domusdigitalis.dev" \
  ttl="8760h" > /tmp/<service>-cert.json

# Extract (including CA chain for full trust)
jq -r '.data.certificate' /tmp/<service>-cert.json > /tmp/<service>.crt
jq -r '.data.private_key' /tmp/<service>-cert.json > /tmp/<service>.key
jq -r '.data.ca_chain[]' /tmp/<service>-cert.json >> /tmp/<service>.crt

# Copy to k3s node
scp /tmp/<service>.crt /tmp/<service>.key k3s-master-01:/tmp/

Workflow (on k3s-master-01):

# Create TLS secret
kubectl -n monitoring create secret tls <service>-tls \
  --cert=/tmp/<service>.crt \
  --key=/tmp/<service>.key

Verification commands added to runbook:

# Custom-columns IngressRoute listing (shows hostname config)
kubectl get ingressroute -n monitoring -o custom-columns=NAME:.metadata.name,HOST:.spec.routes[0].match

# DNS resolution with host+awk (clean hostname → IP output)
for h in grafana prometheus alertmanager; do
  host ${h}.inside.domusdigitalis.dev | awk '{print $1, $NF}'
done

Result: All three services have Vault PKI TLS certificates with Traefik IngressRoutes.

Documented in: k3s-prometheus-grafana.adoc Phase 7 (complete rewrite from Let’s Encrypt to Vault PKI)

Session 7: NodePort Troubleshooting Deep Dive

Evening UTC

Objective: Fix NodePort 32503 not reachable from workstation

Symptom: nc -zv 10.50.1.120 32503 hangs, but ports 22, 3000, 9090 work.

Troubleshooting Path:

  1. Firewalld - Added NodePort range: 30000-32767/tcp

  2. pfSense rules - Verified OPT1_NETWORK to MGMT_NET rule exists

  3. tcpdump on pfSense - Packets reaching vtnet1 (MGMT interface)

  4. tcpdump on k3s node - Packets arriving at eth0, but no SYN-ACK

  5. Cilium monitor - Saw VLAN traffic drops (red herring - just CDP noise)

  6. Cilium BPF lb list - Service entries exist, both routable

  7. tc filter show - EMPTY! No BPF attached to eth0

Root Cause: Cilium in VXLAN tunnel mode doesn’t attach BPF tc filters to eth0. NodePort handled via iptables/SNAT, not direct BPF.

Failed Fix: bpf-lb-mode: hybrid - crashes Cilium ("cannot be used with vxlan tunneling")

Working Fix: bpf-lb-mode: snat (default) - NodePort works via iptables SNAT

Key Commands Learned:

# BPF lb map analysis
kubectl -n kube-system exec ds/cilium -- cilium bpf lb list | awk '
/NodePort/ {
  if (/non-routable/) print "NON-ROUTABLE:", $1
  else print "ROUTABLE:", $1
}'

# tc filter check (empty = no BPF on interface)
tc filter show dev eth0 ingress

# Cilium drop monitor
kubectl -n kube-system exec ds/cilium -- cilium monitor --type drop

# Firewall port list (sorted)
sudo firewall-cmd --list-ports | tr ' ' '\n' | sort -t/ -k1 -n

Result: NodePort 32503 works after adding firewall rule and keeping snat mode.

Documented in: reference/commands/k8s-network-favorites.adoc (new), security-favorites.adoc (firewalld section)

Session 8: MetalLB LoadBalancer Deployment

22:43 UTC

Objective: Deploy MetalLB for proper LoadBalancer support on bare metal k3s

Problem: Standard ports 80/443 not working - Cilium VXLAN mode + bare metal = no cloud LoadBalancer.

Solution: MetalLB L2 mode (ARP advertisement)

Deployment:

# Install MetalLB
helm repo add metallb https://metallb.github.io/metallb
helm repo update
helm install metallb metallb/metallb -n metallb-system --create-namespace

# Configure IP pool (verified free first)
cat <<'EOF' | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: mgmt-pool
  namespace: metallb-system
spec:
  addresses:
  - 10.50.1.130-10.50.1.140
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: mgmt-l2
  namespace: metallb-system
spec:
  ipAddressPools:
  - mgmt-pool
EOF

Result: Traefik immediately got 10.50.1.130 from MetalLB pool.

DNS Updated:

netapi pfsense dns update --id 7 -h grafana -d inside.domusdigitalis.dev -i 10.50.1.130 --descr "Grafana (MetalLB)"
netapi pfsense dns update --id 37 -h prometheus -d inside.domusdigitalis.dev -i 10.50.1.130 --descr "Prometheus (MetalLB)"
netapi pfsense dns update --id 4 -h alertmanager -d inside.domusdigitalis.dev -i 10.50.1.130 --descr "AlertManager (MetalLB)"

Verification:

curl -ks https://grafana.inside.domusdigitalis.dev | head -1  # HTTP/2 302
curl -ks https://prometheus.inside.domusdigitalis.dev/-/healthy  # Healthy
curl -ks https://alertmanager.inside.domusdigitalis.dev/-/healthy  # OK

Documented in: k3s-metallb.adoc (new runbook)

Session 9: Terraform IaC Expansion

Late evening UTC

Objective: Expand Terraform IaC capabilities and test Cloudflare provider

Completed:

  1. Expanded environments:

    • environments/prod/cloudflare/ - DNS records, Access policies

    • environments/prod/vault/ - PKI roles, SSH CA, policies, AppRole

    • environments/prod/k3s/ - Namespaces, Helm releases

    • environments/prod/keycloak/ - OIDC clients

    • environments/prod/github/ - Repository settings

  2. Fixed Cloudflare auth:

    • Zone ID: cb40c4839b06cd46dd0a0a435684550c (not account ID)

    • Added CLOUDFLARE_API_TOKEN to dsec (d000/dev/app)

    • terraform plan returns "No changes" (matches existing DNS)

  3. Documentation:

    • 3 D2 diagrams created (architecture, repo structure, workflow)

    • 631-line runbook at terraform-iac.adoc

    • README converted from markdown to AsciiDoc (NO MARKDOWN ALLOWED)

    • Added module placeholders (k3s-node, vault-node)

  4. Aliases added:

    • domus-terraform and dtf in dotfiles-optimus

Key Learning: Infrastructure as Code vs manual CLI is the jump from operator to engineer.

Session 10: ISE AD Join Clock Skew + Vault SSH CA

23:30 - 00:15 UTC

Objective: Fix ISE AD join failure and deploy Vault SSH CA to modestus-p50

Issue 1: ISE AD Join Clock Skew

ISE failed to join AD with:

Error Code: 40087
Error Name: LW_ERROR_CLOCK_SKEW

Root Cause: Windows DC (home-dc01) had no NTP source configured. w32tm /resync /force returned "no time data was available".

Fix on DC (PowerShell):

w32tm /config /manualpeerlist:"10.50.1.1" /syncfromflags:manual /reliable:yes /update
Restart-Service w32time
w32tm /resync /force

Documented in: windows-dc-core.adoc - Added new "Phase 0: NTP/Time Synchronization" section with troubleshooting.

Issue 2: Vault SSH CA Principal Mismatch

modestus-p50 Vault SSH cert rejected:

Certificate invalid: name is not a listed principal

User on modestus-p50 is gabriel, but cert only had evanusmodestus principal.

Fix:

# Add gabriel to vault-ssh-sign script
sed -i 's/PRINCIPALS="adminerosado,admin,ansible,evanusmodestus,root"/PRINCIPALS="adminerosado,admin,ansible,evanusmodestus,gabriel,root"/' ~/.local/bin/vault-ssh-sign

# Re-sign
vault-ssh-sign

# Test
ssh -o CertificateFile=~/.ssh/id_ed25519_vault-cert.pub gabriel@modestus-p50 whoami

Documented in: vault-ssh-ca.adoc - Added gabriel to Principal Requirements table, added Configured Hosts Status section.

Issue 3: ISE DNS Forward Lookups Failing

ISE nslookup returned ANSWER: 0 for forward lookups despite PTR (reverse) working.

Investigation: - pfSense domain override to BIND: ✓ configured - BIND A records: ✓ exist - Direct query to BIND (dig @10.50.1.90): ✓ works - pfSense resolver (dig @127.0.0.1 on pfsense): ✓ works

Root Cause: ISE nslookup queries IN ANY record type, which some resolvers don’t handle well. The actual DNS resolution (for applications) works fine.

Tomorrow’s Priorities

  • Configure Grafana: secure password + TLS cert

  • Configure Prometheus: TLS cert

  • Configure AlertManager: TLS cert

  • Test HTTPS access from workstation (port 443 via MetalLB)

  • Wazuh agents on infrastructure hosts (vault-01, kvm-01)

  • Syslog sources (pfSense, ISE, switches)