WRKLOG-2026-03-09

Summary

Study and recovery day. Focus on understanding VyOS architecture before expanding firewall HA.

Carried Over from 2026-03-08

Priority Task Status Notes

P0

k3s pod NAT verification

[ ] PENDING

NAT rule 170 applied, test pod internet access

P0

Wazuh indexer recovery

[ ] PENDING

Restart pod after NAT confirmed working

P1

Wazuh dashboard accessibility

[ ] PENDING

Depends on indexer recovery

kvm-01 and kvm-02 are at parity - both Rocky 9.7 with identical libvirt VLAN hooks.

VyOS Study Plan

Runbook References

Runbook URL

Master Orchestration

VyOS Migration Master (infra-ops runbook)

Deployment (20 phases)

VyOS Deployment (infra-ops runbook)

Daily Operations

VyOS Operations Quick Ref (infra-ops runbook)

BIND DNS Records

BIND Infrastructure Records (infra-ops runbook)

pfSense Decommission

pfSense Decommission (infra-ops runbook)

Current VyOS Architecture

Host IP Role VRRP Priority

vyos-01

10.50.1.3

Master

200

vyos-02

10.50.1.2

Backup

100

VIP

10.50.1.1

Gateway

-

VRRP Deep Dive

What is VRRP?

Virtual Router Redundancy Protocol - allows multiple routers to present as a single virtual gateway.

Key concepts:

Term Meaning

VIP (Virtual IP)

Shared IP that floats between routers - what clients use as gateway

Priority

Higher number = more likely to be Master (range 1-255)

Preemption

Master role automatically transfers to higher-priority router when it recovers

Advertisement interval

How often Master announces it’s alive (default 1s)

VRID

Virtual Router ID - must match between routers in same group

VyOS VRRP Commands

# Check VRRP status
show vrrp

# View VRRP configuration
show configuration commands | grep vrrp

# Watch VRRP state changes in real-time
monitor log | grep -i vrrp

VRRP Testing Scenarios

Test How Expected Result

Failover

Shutdown vyos-01

vyos-02 becomes Master, VIP moves

Failback

Start vyos-01

vyos-01 reclaims Master (preemption)

Split-brain prevention

Test with sync-group

Both interfaces fail together

Traffic continuity

Continuous ping during failover

1-3 packets lost max

Firewall Zone Architecture

Your VyOS uses zone-based firewall (like Cisco ZBF):

             WAN
              β”‚
         β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
         β”‚  VyOS   β”‚
         β”‚ (LOCAL) β”‚
         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
              β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚         β”‚         β”‚         β”‚
   MGMT     DATA      IOT      GUEST
 (VLAN100) (VLAN10) (VLAN40)  (VLAN30)

Zone policy pattern: <FROM>_<TO> (e.g., MGMT_WAN, DATA_LOCAL)

Firewall Inspection Commands

# List all firewall rules
show firewall

# Specific zone policy
show firewall ipv4 name MGMT_WAN

# View rule hit counts
show firewall ipv4 name MGMT_WAN statistics

# Show active connections
show conntrack table

# Clear connection tracking (careful!)
# delete conntrack table

NAT Architecture

Current NAT rules (from Session 13):

Rule Source Purpose

100

NET_INFRA (10.50.1.0/24)

Infrastructure to internet

110

NET_DATA

Corporate data

120

NET_VOICE

VoIP

130

NET_GUEST

Guest WiFi

140

NET_IOT

IoT devices

150

NET_SECURITY

Security zone

160

NET_SERVICES

Service VMs

170

NET_K3S_PODS (10.42.0.0/16)

k3s pod network (NEW)

Alerting Ideas

Syslog to Wazuh

VyOS can forward logs to Wazuh SIEM:

configure
set system syslog host 10.50.1.135 facility all level info
set system syslog host 10.50.1.135 protocol udp
set system syslog host 10.50.1.135 port 514
commit
save

VRRP State Change Alerts

Create script to notify on failover:

# /config/scripts/vrrp-notify.sh
#!/bin/vbash
TYPE=$1
NAME=$2
STATE=$3

logger "VRRP: $NAME changed to $STATE"

# Could add: curl to webhook, email, etc.

Then in VRRP config:

set high-availability vrrp group MGMT notify script /config/scripts/vrrp-notify.sh

Session Log

Session 1: k3s NAT Verification

Time: Morning

Objective: Confirm pod internet access after NAT rule 170.

Commands:

# Test from k3s node
kubectl run test-curl --rm -it --image=curlimages/curl --restart=Never -- curl -sI https://hub.docker.com | head -3

# Check VyOS NAT counters
ssh vyos-01 "show nat source statistics"

# Verify rule 170 is matching
ssh vyos-01 "show nat source rules" | awk '/170/'

Result: [ ] PENDING

Session 2: Wazuh Recovery

Objective: Get Wazuh stack operational.

Commands:

# Delete stuck pod (will recreate)
kubectl delete pod wazuh-indexer-0 -n wazuh

# Watch pod status
kubectl get pods -n wazuh -w

# Check indexer logs
kubectl logs -n wazuh wazuh-indexer-0 --tail=50

# Test dashboard
curl -kIs https://wazuh-dashboard.inside.domusdigitalis.dev | head -5

Result: [ ] PENDING

Reflection

Hours Worked (2026-03-08)

Estimated: 10-12 hours of intense troubleshooting across:

  • iPSK Manager + ISE ODBC

  • WLC HA SSO

  • EAP-TLS WiFi

  • VM migrations

  • DNS forward/reverse zones

  • k3s pod networking

  • VyOS NAT rules

What I Learned

Domain Insight

Convergence

Everything connects: storage affects compute, compute affects network, network affects identity

Troubleshooting

Follow the packet - DNS β†’ NAT β†’ firewall β†’ service β†’ pod β†’ container

Unix Philosophy

Small tools compose: show config | awk '/pattern/' is powerful

HA Architecture

Both nodes need identical config - VyOS HA means configuring twice

Pressure

Real incidents teach faster than labs - family waiting builds urgency

Books & Resources to Explore

  • VyOS Documentation: docs.vyos.io/

  • VRRP RFC 5798: Original protocol specification

  • Unix Power Tools: Classic O’Reilly book on CLI mastery

  • Cisco Networking Academy: Zone-based firewall concepts transfer to VyOS

Tomorrow (2026-03-10)

  • Complete Wazuh recovery

  • VRRP failover test (shutdown vyos-01)

  • Document failover timing

Future Planning

EAP-TEAP for Windows (domus)

Why TEAP over PEAP-MSCHAPv2:

  • Supports EAP chaining (machine + user auth in single session)

  • Certificate-based authentication (no password hashes)

  • Better security posture for Windows endpoints

  • Aligns with zero-trust architecture

Implementation path:

  1. ISE: Enable TEAP in Allowed Protocols

  2. ISE: Configure EAP chaining policy

  3. Windows GPO: Configure TEAP supplicant settings

  4. Vault PKI: Issue machine certificates for Windows

  5. Test with modestus-aw (dual-boot) before fleet rollout

Runbook needed: windows-eap-teap-deployment.adoc

MDM for Mobile Client Certificates

Problem: iOS/Android devices need certificates for 802.1X but can’t enroll directly from Vault.

MDM solution:

  • Issue certificates via MDM (Intune, Jamf, or open-source like Fleet)

  • MDM pushes WiFi profiles with embedded certs

  • Supplicant configuration handled by MDM profile

  • Certificate renewal automated through MDM

Options to evaluate:

MDM Platform Notes

Microsoft Intune

iOS, Android, Windows

Enterprise, integrates with Entra ID

Jamf Pro

iOS, macOS

Apple-focused, SCEP/ACME support

Fleet

All platforms

Open-source, osquery-based

MicroMDM

iOS, macOS

Open-source, minimal

Integration with Vault PKI:

  • SCEP proxy from MDM to Vault

  • Or: MDM requests certs from Vault API, pushes to devices

  • Certificate lifecycle managed centrally

Runbook needed: mdm-mobile-802.1x-deployment.adoc

Open-Source Alternatives Reference

For colleagues who want to build similar infrastructure without vendor licensing.

Network Access Control (replaces Cisco ISE)

Solution Capabilities Notes

FreeRADIUS

RADIUS server, 802.1X (EAP-TLS, PEAP, TTLS), MAC auth, accounting

Industry standard, powers most commercial NAC products. Requires manual policy config.

PacketFence

Full NAC: 802.1X, MAB, captive portal, VLAN assignment, device profiling, guest management

Web GUI, integrates FreeRADIUS + MariaDB. Closest to ISE feature parity.

OpenNAC

802.1X, device profiling, VLAN steering, compliance checking

Spanish company, enterprise features open-source.

Aruba ClearPass (Policy Manager)

Full NAC suite

Not open-source but often cheaper than ISE. Mention for completeness.

Recommendation: PacketFence for full NAC, FreeRADIUS if you only need RADIUS.

Wireless (replaces Cisco WLC 9800)

Solution Capabilities Notes

hostapd

Software AP, 802.1X supplicant, WPA3, RADIUS integration

Runs on any Linux box with compatible WiFi card. Single AP only.

OpenWrt

Full router/AP OS, VLAN, firewall, captive portal, 802.1X

Flash onto consumer APs (Ubiquiti, TP-Link, etc.). Per-AP management.

OpenWISP

Centralized WiFi management, zero-touch provisioning, monitoring

Controller for OpenWrt APs. Closest to WLC concept.

Tanaza

Cloud-managed APs

Not fully open-source but has free tier.

Recommendation: OpenWrt APs + OpenWISP controller for multi-AP deployments.

Identity & Directory (replaces Windows Server AD)

Solution Capabilities Notes

FreeIPA

Kerberos, LDAP, DNS, certificate authority, sudo rules, HBAC

Red Hat sponsored. Best for Linux-centric environments. You already run this!

Samba AD DC

Full Active Directory domain controller, GPO, DNS, LDAP

Windows clients can join. Use when Windows integration is required.

LLDAP

Lightweight LDAP server with web UI

Simple user/group management. Good for small deployments.

Authentik

Identity provider: SAML, OIDC, LDAP, SCIM

Modern IdP, replaces ADFS/Okta for SSO. Pairs with FreeIPA/Samba for directory.

Keycloak

Identity and access management, SSO, SAML, OIDC

You already run this! Red Hat sponsored.

Recommendation: FreeIPA for Linux, Samba AD DC if Windows clients need domain join.

PKI & Secrets (replaces AD CS / commercial CA)

Solution Capabilities Notes

HashiCorp Vault (OSS)

PKI CA, secrets management, SSH CA, dynamic credentials

You already run this! OSS version is very capable.

step-ca (Smallstep)

ACME server, X.509 CA, SSH CA, device attestation

Modern, lightweight. Native ACME support (like Let’s Encrypt).

cfssl (Cloudflare)

PKI toolkit, CA operations, certificate signing

CLI-focused, good for automation pipelines.

EJBCA

Enterprise-grade CA, CMP, SCEP, ACME, EST

Full PKI suite. Java-based, complex but powerful.

Dogtag

Certificate system, CRL, OCSP

Part of FreeIPA. Red Hat sponsored.

Recommendation: Vault for secrets + PKI combined, step-ca if you want lightweight ACME.

Firewall/Router (replaces Cisco/Palo Alto)

Solution Capabilities Notes

VyOS

CLI router, firewall, VPN, VRRP, BGP, OSPF

You already run this! Debian-based, Cisco-like CLI.

OPNsense

Web GUI firewall, IDS/IPS (Suricata), VPN, HA

FreeBSD-based, modern fork of pfSense.

pfSense

Web GUI firewall, VPN, HA (CARP)

You migrated away from this, but still solid option.

nftables/iptables

Raw Linux firewall

Maximum flexibility, no GUI. Steep learning curve.

Recommendation: VyOS for CLI purists, OPNsense for GUI preference.

SIEM & Monitoring (replaces Splunk/QRadar)

Solution Capabilities Notes

Wazuh

SIEM, log analysis, intrusion detection, compliance

You’re deploying this! OpenSearch backend.

Security Onion

Full security monitoring: Suricata, Zeek, Elasticsearch, TheHive

All-in-one security distro. Network-focused.

Graylog

Log management, alerting, dashboards

Easier than ELK stack. MongoDB + Elasticsearch.

ELK Stack

Elasticsearch, Logstash, Kibana

Industry standard. Steeper learning curve.

Grafana Loki

Log aggregation optimized for Grafana

Lightweight, label-based. Good with Prometheus.

Recommendation: Wazuh for security focus, Graylog for general log management.

MDM (replaces Intune/Jamf)

Solution Capabilities Notes

Fleet

Device management, osquery, policies, software deployment

Cross-platform. Uses osquery for telemetry.

MicroMDM

Apple MDM, DEP, VPP

Lightweight, iOS/macOS only.

NanoMDM

Apple MDM server

Even lighter than MicroMDM.

Headwind MDM

Android MDM, kiosk mode, app management

Open-source, Android-focused.

Recommendation: Fleet for cross-platform, MicroMDM for Apple-only.

Switching (lab/software alternatives)

Solution Capabilities Notes

Open vSwitch (OVS)

Software switch, VLAN, OpenFlow, tunneling

For VMs/containers. Not a physical switch replacement.

SONiC

Network OS for whitebox switches

Microsoft open-source. Runs on compatible hardware.

Cumulus Linux

Network OS for whitebox switches

NVIDIA owned, free tier available.

GNS3 / EVE-NG

Network simulation

Lab environments, not production.

Note: For physical switching, Ubiquiti or MikroTik offer affordable non-Cisco options.

Complete Open-Source Stack Example

For a colleague starting from scratch:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    NETWORK ACCESS                           β”‚
β”‚  PacketFence (NAC) + FreeRADIUS (802.1X)                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      IDENTITY                               β”‚
β”‚  FreeIPA (Linux) or Samba AD DC (Windows) + Keycloak (SSO) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CERTIFICATES                             β”‚
β”‚  Vault OSS (PKI + SSH CA) or step-ca (ACME)                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   NETWORK INFRA                             β”‚
β”‚  VyOS (router) + OpenWrt/OpenWISP (wireless)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   OBSERVABILITY                             β”‚
β”‚  Wazuh (SIEM) + Prometheus/Grafana (metrics)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     COMPUTE                                 β”‚
β”‚  KVM/libvirt (hypervisor) + k3s (containers)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Total licensing cost: $0

What You Lose Without Enterprise Vendors

Feature Vendor Advantage Open-Source Gap

Support

24/7 TAC, SLAs

Community forums, self-reliance

Profiling

ISE device profiler (thousands of signatures)

PacketFence profiling is more limited

pxGrid

Real-time context sharing

No equivalent

GUI polish

ISE/WLC web interfaces

Open-source GUIs vary in quality

Integration

Cisco DNA/Meraki ecosystem

Manual integration required

Compliance

Prebuilt compliance reports

Build your own

Bottom line: Open-source can do 80-90% of what enterprise does. The last 10-20% is polish, support, and deep integrations.