WRKLOG-2026-03-09
Summary
Study and recovery day. Focus on understanding VyOS architecture before expanding firewall HA.
Carried Over from 2026-03-08
| Priority | Task | Status | Notes |
|---|---|---|---|
P0 |
k3s pod NAT verification |
[ ] PENDING |
NAT rule 170 applied, test pod internet access |
P0 |
Wazuh indexer recovery |
[ ] PENDING |
Restart pod after NAT confirmed working |
P1 |
Wazuh dashboard accessibility |
[ ] PENDING |
Depends on indexer recovery |
| kvm-01 and kvm-02 are at parity - both Rocky 9.7 with identical libvirt VLAN hooks. |
VyOS Study Plan
Runbook References
| Runbook | URL |
|---|---|
Master Orchestration |
VyOS Migration Master (infra-ops runbook) |
Deployment (20 phases) |
VyOS Deployment (infra-ops runbook) |
Daily Operations |
VyOS Operations Quick Ref (infra-ops runbook) |
BIND DNS Records |
BIND Infrastructure Records (infra-ops runbook) |
pfSense Decommission |
pfSense Decommission (infra-ops runbook) |
Current VyOS Architecture
| Host | IP | Role | VRRP Priority |
|---|---|---|---|
vyos-01 |
10.50.1.3 |
Master |
200 |
vyos-02 |
10.50.1.2 |
Backup |
100 |
VIP |
10.50.1.1 |
Gateway |
- |
VRRP Deep Dive
What is VRRP?
Virtual Router Redundancy Protocol - allows multiple routers to present as a single virtual gateway.
Key concepts:
| Term | Meaning |
|---|---|
VIP (Virtual IP) |
Shared IP that floats between routers - what clients use as gateway |
Priority |
Higher number = more likely to be Master (range 1-255) |
Preemption |
Master role automatically transfers to higher-priority router when it recovers |
Advertisement interval |
How often Master announces it’s alive (default 1s) |
VRID |
Virtual Router ID - must match between routers in same group |
VyOS VRRP Commands
# Check VRRP status
show vrrp
# View VRRP configuration
show configuration commands | grep vrrp
# Watch VRRP state changes in real-time
monitor log | grep -i vrrp
VRRP Testing Scenarios
| Test | How | Expected Result |
|---|---|---|
Failover |
Shutdown vyos-01 |
vyos-02 becomes Master, VIP moves |
Failback |
Start vyos-01 |
vyos-01 reclaims Master (preemption) |
Split-brain prevention |
Test with sync-group |
Both interfaces fail together |
Traffic continuity |
Continuous ping during failover |
1-3 packets lost max |
Firewall Zone Architecture
Your VyOS uses zone-based firewall (like Cisco ZBF):
WAN
β
ββββββ΄βββββ
β VyOS β
β (LOCAL) β
ββββββ¬βββββ
β
βββββββββββΌββββββββββ¬ββββββββββ
β β β β
MGMT DATA IOT GUEST
(VLAN100) (VLAN10) (VLAN40) (VLAN30)
Zone policy pattern: <FROM>_<TO> (e.g., MGMT_WAN, DATA_LOCAL)
Firewall Inspection Commands
# List all firewall rules
show firewall
# Specific zone policy
show firewall ipv4 name MGMT_WAN
# View rule hit counts
show firewall ipv4 name MGMT_WAN statistics
# Show active connections
show conntrack table
# Clear connection tracking (careful!)
# delete conntrack table
NAT Architecture
Current NAT rules (from Session 13):
| Rule | Source | Purpose |
|---|---|---|
100 |
NET_INFRA (10.50.1.0/24) |
Infrastructure to internet |
110 |
NET_DATA |
Corporate data |
120 |
NET_VOICE |
VoIP |
130 |
NET_GUEST |
Guest WiFi |
140 |
NET_IOT |
IoT devices |
150 |
NET_SECURITY |
Security zone |
160 |
NET_SERVICES |
Service VMs |
170 |
NET_K3S_PODS (10.42.0.0/16) |
k3s pod network (NEW) |
Alerting Ideas
Syslog to Wazuh
VyOS can forward logs to Wazuh SIEM:
configure
set system syslog host 10.50.1.135 facility all level info
set system syslog host 10.50.1.135 protocol udp
set system syslog host 10.50.1.135 port 514
commit
save
VRRP State Change Alerts
Create script to notify on failover:
# /config/scripts/vrrp-notify.sh
#!/bin/vbash
TYPE=$1
NAME=$2
STATE=$3
logger "VRRP: $NAME changed to $STATE"
# Could add: curl to webhook, email, etc.
Then in VRRP config:
set high-availability vrrp group MGMT notify script /config/scripts/vrrp-notify.sh
Session Log
Session 1: k3s NAT Verification
Time: Morning
Objective: Confirm pod internet access after NAT rule 170.
Commands:
# Test from k3s node
kubectl run test-curl --rm -it --image=curlimages/curl --restart=Never -- curl -sI https://hub.docker.com | head -3
# Check VyOS NAT counters
ssh vyos-01 "show nat source statistics"
# Verify rule 170 is matching
ssh vyos-01 "show nat source rules" | awk '/170/'
Result: [ ] PENDING
Session 2: Wazuh Recovery
Objective: Get Wazuh stack operational.
Commands:
# Delete stuck pod (will recreate)
kubectl delete pod wazuh-indexer-0 -n wazuh
# Watch pod status
kubectl get pods -n wazuh -w
# Check indexer logs
kubectl logs -n wazuh wazuh-indexer-0 --tail=50
# Test dashboard
curl -kIs https://wazuh-dashboard.inside.domusdigitalis.dev | head -5
Result: [ ] PENDING
Reflection
Hours Worked (2026-03-08)
Estimated: 10-12 hours of intense troubleshooting across:
-
iPSK Manager + ISE ODBC
-
WLC HA SSO
-
EAP-TLS WiFi
-
VM migrations
-
DNS forward/reverse zones
-
k3s pod networking
-
VyOS NAT rules
What I Learned
| Domain | Insight |
|---|---|
Convergence |
Everything connects: storage affects compute, compute affects network, network affects identity |
Troubleshooting |
Follow the packet - DNS β NAT β firewall β service β pod β container |
Unix Philosophy |
Small tools compose: |
HA Architecture |
Both nodes need identical config - VyOS HA means configuring twice |
Pressure |
Real incidents teach faster than labs - family waiting builds urgency |
Books & Resources to Explore
-
VyOS Documentation: docs.vyos.io/
-
VRRP RFC 5798: Original protocol specification
-
Unix Power Tools: Classic O’Reilly book on CLI mastery
-
Cisco Networking Academy: Zone-based firewall concepts transfer to VyOS
Tomorrow (2026-03-10)
-
Complete Wazuh recovery
-
VRRP failover test (shutdown vyos-01)
-
Document failover timing
Future Planning
EAP-TEAP for Windows (domus)
Why TEAP over PEAP-MSCHAPv2:
-
Supports EAP chaining (machine + user auth in single session)
-
Certificate-based authentication (no password hashes)
-
Better security posture for Windows endpoints
-
Aligns with zero-trust architecture
Implementation path:
-
ISE: Enable TEAP in Allowed Protocols
-
ISE: Configure EAP chaining policy
-
Windows GPO: Configure TEAP supplicant settings
-
Vault PKI: Issue machine certificates for Windows
-
Test with modestus-aw (dual-boot) before fleet rollout
Runbook needed: windows-eap-teap-deployment.adoc
MDM for Mobile Client Certificates
Problem: iOS/Android devices need certificates for 802.1X but can’t enroll directly from Vault.
MDM solution:
-
Issue certificates via MDM (Intune, Jamf, or open-source like Fleet)
-
MDM pushes WiFi profiles with embedded certs
-
Supplicant configuration handled by MDM profile
-
Certificate renewal automated through MDM
Options to evaluate:
| MDM | Platform | Notes |
|---|---|---|
Microsoft Intune |
iOS, Android, Windows |
Enterprise, integrates with Entra ID |
Jamf Pro |
iOS, macOS |
Apple-focused, SCEP/ACME support |
Fleet |
All platforms |
Open-source, osquery-based |
MicroMDM |
iOS, macOS |
Open-source, minimal |
Integration with Vault PKI:
-
SCEP proxy from MDM to Vault
-
Or: MDM requests certs from Vault API, pushes to devices
-
Certificate lifecycle managed centrally
Runbook needed: mdm-mobile-802.1x-deployment.adoc
Open-Source Alternatives Reference
For colleagues who want to build similar infrastructure without vendor licensing.
Network Access Control (replaces Cisco ISE)
| Solution | Capabilities | Notes |
|---|---|---|
FreeRADIUS |
RADIUS server, 802.1X (EAP-TLS, PEAP, TTLS), MAC auth, accounting |
Industry standard, powers most commercial NAC products. Requires manual policy config. |
PacketFence |
Full NAC: 802.1X, MAB, captive portal, VLAN assignment, device profiling, guest management |
Web GUI, integrates FreeRADIUS + MariaDB. Closest to ISE feature parity. |
OpenNAC |
802.1X, device profiling, VLAN steering, compliance checking |
Spanish company, enterprise features open-source. |
Aruba ClearPass (Policy Manager) |
Full NAC suite |
Not open-source but often cheaper than ISE. Mention for completeness. |
Recommendation: PacketFence for full NAC, FreeRADIUS if you only need RADIUS.
Wireless (replaces Cisco WLC 9800)
| Solution | Capabilities | Notes |
|---|---|---|
hostapd |
Software AP, 802.1X supplicant, WPA3, RADIUS integration |
Runs on any Linux box with compatible WiFi card. Single AP only. |
OpenWrt |
Full router/AP OS, VLAN, firewall, captive portal, 802.1X |
Flash onto consumer APs (Ubiquiti, TP-Link, etc.). Per-AP management. |
OpenWISP |
Centralized WiFi management, zero-touch provisioning, monitoring |
Controller for OpenWrt APs. Closest to WLC concept. |
Tanaza |
Cloud-managed APs |
Not fully open-source but has free tier. |
Recommendation: OpenWrt APs + OpenWISP controller for multi-AP deployments.
Identity & Directory (replaces Windows Server AD)
| Solution | Capabilities | Notes |
|---|---|---|
FreeIPA |
Kerberos, LDAP, DNS, certificate authority, sudo rules, HBAC |
Red Hat sponsored. Best for Linux-centric environments. You already run this! |
Samba AD DC |
Full Active Directory domain controller, GPO, DNS, LDAP |
Windows clients can join. Use when Windows integration is required. |
LLDAP |
Lightweight LDAP server with web UI |
Simple user/group management. Good for small deployments. |
Authentik |
Identity provider: SAML, OIDC, LDAP, SCIM |
Modern IdP, replaces ADFS/Okta for SSO. Pairs with FreeIPA/Samba for directory. |
Keycloak |
Identity and access management, SSO, SAML, OIDC |
You already run this! Red Hat sponsored. |
Recommendation: FreeIPA for Linux, Samba AD DC if Windows clients need domain join.
PKI & Secrets (replaces AD CS / commercial CA)
| Solution | Capabilities | Notes |
|---|---|---|
HashiCorp Vault (OSS) |
PKI CA, secrets management, SSH CA, dynamic credentials |
You already run this! OSS version is very capable. |
step-ca (Smallstep) |
ACME server, X.509 CA, SSH CA, device attestation |
Modern, lightweight. Native ACME support (like Let’s Encrypt). |
cfssl (Cloudflare) |
PKI toolkit, CA operations, certificate signing |
CLI-focused, good for automation pipelines. |
EJBCA |
Enterprise-grade CA, CMP, SCEP, ACME, EST |
Full PKI suite. Java-based, complex but powerful. |
Dogtag |
Certificate system, CRL, OCSP |
Part of FreeIPA. Red Hat sponsored. |
Recommendation: Vault for secrets + PKI combined, step-ca if you want lightweight ACME.
Firewall/Router (replaces Cisco/Palo Alto)
| Solution | Capabilities | Notes |
|---|---|---|
VyOS |
CLI router, firewall, VPN, VRRP, BGP, OSPF |
You already run this! Debian-based, Cisco-like CLI. |
OPNsense |
Web GUI firewall, IDS/IPS (Suricata), VPN, HA |
FreeBSD-based, modern fork of pfSense. |
pfSense |
Web GUI firewall, VPN, HA (CARP) |
You migrated away from this, but still solid option. |
nftables/iptables |
Raw Linux firewall |
Maximum flexibility, no GUI. Steep learning curve. |
Recommendation: VyOS for CLI purists, OPNsense for GUI preference.
SIEM & Monitoring (replaces Splunk/QRadar)
| Solution | Capabilities | Notes |
|---|---|---|
Wazuh |
SIEM, log analysis, intrusion detection, compliance |
You’re deploying this! OpenSearch backend. |
Security Onion |
Full security monitoring: Suricata, Zeek, Elasticsearch, TheHive |
All-in-one security distro. Network-focused. |
Graylog |
Log management, alerting, dashboards |
Easier than ELK stack. MongoDB + Elasticsearch. |
ELK Stack |
Elasticsearch, Logstash, Kibana |
Industry standard. Steeper learning curve. |
Grafana Loki |
Log aggregation optimized for Grafana |
Lightweight, label-based. Good with Prometheus. |
Recommendation: Wazuh for security focus, Graylog for general log management.
MDM (replaces Intune/Jamf)
| Solution | Capabilities | Notes |
|---|---|---|
Fleet |
Device management, osquery, policies, software deployment |
Cross-platform. Uses osquery for telemetry. |
MicroMDM |
Apple MDM, DEP, VPP |
Lightweight, iOS/macOS only. |
NanoMDM |
Apple MDM server |
Even lighter than MicroMDM. |
Headwind MDM |
Android MDM, kiosk mode, app management |
Open-source, Android-focused. |
Recommendation: Fleet for cross-platform, MicroMDM for Apple-only.
Switching (lab/software alternatives)
| Solution | Capabilities | Notes |
|---|---|---|
Open vSwitch (OVS) |
Software switch, VLAN, OpenFlow, tunneling |
For VMs/containers. Not a physical switch replacement. |
SONiC |
Network OS for whitebox switches |
Microsoft open-source. Runs on compatible hardware. |
Cumulus Linux |
Network OS for whitebox switches |
NVIDIA owned, free tier available. |
GNS3 / EVE-NG |
Network simulation |
Lab environments, not production. |
Note: For physical switching, Ubiquiti or MikroTik offer affordable non-Cisco options.
Complete Open-Source Stack Example
For a colleague starting from scratch:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NETWORK ACCESS β
β PacketFence (NAC) + FreeRADIUS (802.1X) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β IDENTITY β
β FreeIPA (Linux) or Samba AD DC (Windows) + Keycloak (SSO) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CERTIFICATES β
β Vault OSS (PKI + SSH CA) or step-ca (ACME) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β NETWORK INFRA β
β VyOS (router) + OpenWrt/OpenWISP (wireless) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OBSERVABILITY β
β Wazuh (SIEM) + Prometheus/Grafana (metrics) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β COMPUTE β
β KVM/libvirt (hypervisor) + k3s (containers) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Total licensing cost: $0
What You Lose Without Enterprise Vendors
| Feature | Vendor Advantage | Open-Source Gap |
|---|---|---|
Support |
24/7 TAC, SLAs |
Community forums, self-reliance |
Profiling |
ISE device profiler (thousands of signatures) |
PacketFence profiling is more limited |
pxGrid |
Real-time context sharing |
No equivalent |
GUI polish |
ISE/WLC web interfaces |
Open-source GUIs vary in quality |
Integration |
Cisco DNA/Meraki ecosystem |
Manual integration required |
Compliance |
Prebuilt compliance reports |
Build your own |
Bottom line: Open-source can do 80-90% of what enterprise does. The last 10-20% is polish, support, and deep integrations.