kvm-01 Rocky Linux Rebuild

1. Executive Summary

Item Details

Project

kvm-01 Hypervisor Rebuild (Part of VyOS Migration)

Scope

Rocky Linux 9.x install, network bridge config, VM restoration

Current State

Legacy OS on kvm-01, 10 VMs shut down, configs backed up to NAS

Target State

Rocky Linux 9.x with br-mgmt + br-wan bridges (matching kvm-02)

Duration

~2 hours (install + config + VM restore)

Risk Mitigation

All critical services running on kvm-02 secondaries during rebuild

2. Progress Tracker

Phase Status Description

Phase 1

[x]

Extract kvm-02 configuration (6 files to NAS)

Phase 2

[x]

Backup kvm-01 VM definitions (10 VMs exported)

Phase 3

[x]

Install Rocky Linux on SuperDOM (sda)

Phase 4

[ ]

Apply network configuration (br-mgmt, br-wan)

Phase 5

[ ]

Configure VLAN filtering + libvirt hook

Phase 6

[ ]

Import VMs and update network

Phase 7

[ ]

Validation + Vault SSH CA

3. Infrastructure Context

Host Role Status During Rebuild

kvm-01

Primary hypervisor (rebuilding)

OFFLINE

kvm-02

Secondary hypervisor

ACTIVE (handling all services)

vyos-02

Router/Firewall

MASTER (VRRP priority 100)

ise-02

802.1X authentication

ACTIVE (handling all auth)

vault-01

PKI + SSH CA

Running on kvm-02

4. Hardware Inventory

Component Details

Boot Device

14.8GB SuperDOM (sda) - OS install target

VM Storage

978GB NVMe (sdb) - /mnt/onboard-ssd/ - PRESERVED

LAN Interface

eno8 → br-mgmt (10GbE trunk to switch Te1/0/2)

WAN Interface

eno7 → br-wan (ISP connection)

Target IP

10.50.1.110/24 (static on br-mgmt)

5. Prerequisites

Before starting, verify these services are running on kvm-02:

  • vyos-02 (active router - handles all traffic)

  • ise-02 (can handle all 802.1X)

  • bind-01 or bind-02 (DNS)

  • vault-01 (PKI/SSH CA)

5.1. NAS NFS Permissions (CRITICAL)

kvm-01’s new IP (10.50.1.110) must be added to NAS NFS permissions BEFORE attempting mounts.

Via DSM Web UI:

  1. Login to nas-01:5001

  2. Control Panel → File Services → NFS

  3. For each share (vms, isos, backups):

    1. Click share → Edit → NFS Permissions

    2. Add: 10.50.1.110

    3. Privilege: Read/Write

    4. Squash: Map root to admin

    5. Security: sys

Via SSH (if accessible):

ssh nas-01 "synoshare --setuser vms RW @10.50.1.110"
ssh nas-01 "synoshare --setuser isos RW @10.50.1.110"
ssh nas-01 "synoshare --setuser backups RW @10.50.1.110"

Without this step, NFS mounts in Phase 5.3 will fail with "access denied by server".

5.2. Pre-Rebuild Verification

# Verify vyos-02 is handling traffic (from workstation)
ip route | awk '/default/{print "Gateway:", $3}'

Expected: 10.50.1.3 (vyos-02) or 10.50.1.1 (VyOS VIP)

# Verify ISE-02 is processing authentications
netapi ise mnt sessions --format table | head -5
# Verify ise-02 is reachable
ping -c 2 10.50.1.21 && echo "ise-02 OK"

6. Phase 1: Extract kvm-02 Configuration

Run these commands on kvm-02 to capture config for kvm-01.

6.1. 1.1 Export NetworkManager Bridge Config

# SSH to kvm-02
ssh kvm-02
# Export LAN bridge (br-mgmt)
nmcli connection show br-mgmt > /tmp/br-mgmt.conf
# Export LAN bridge slave
nmcli connection show br-mgmt-port > /tmp/br-mgmt-port.conf
# Export WAN bridge (br-wan)
nmcli connection show br-wan > /tmp/br-wan.conf
# Export WAN bridge slave
nmcli connection show br-wan-port > /tmp/br-wan-port.conf

6.2. 1.2 Export Libvirt VLAN Hook

# Copy the libvirt hook (handles VLAN filtering for VMs)
sudo cat /etc/libvirt/hooks/qemu > /tmp/libvirt-hook.sh

6.3. 1.3 Capture Bridge VLAN State

# Document current VLAN config
bridge vlan show > /tmp/bridge-vlan.txt

6.4. 1.4 Copy to NAS

# NAS is mounted via NFS - use cp, not scp
sudo mkdir -p /mnt/nas/backups/kvm-01-rebuild
sudo cp /tmp/{br-mgmt,br-mgmt-port,br-wan,br-wan-port}.conf /tmp/libvirt-hook.sh /tmp/bridge-vlan.txt /mnt/nas/backups/kvm-01-rebuild/
# Verify
sudo ls -la /mnt/nas/backups/kvm-01-rebuild/
Table 1. Captured files (2026-03-07)
File Purpose

br-mgmt.conf

LAN bridge config (eno8 trunk)

br-mgmt-port.conf

LAN bridge slave config

br-wan.conf

WAN bridge config (eno7)

br-wan-port.conf

WAN bridge slave config

libvirt-hook.sh

VLAN hook for VM network assignment

bridge-vlan.txt

Current VLAN state on bridges

7. Phase 2: Backup kvm-01 VM Definitions

Run these commands on kvm-01 before rebuild.

7.1. 2.1 Export All VM Definitions

# SSH to kvm-01
ssh kvm-01
# Become root (sudo doesn't work with for loops)
sudo -i
# Create backup directory on NAS
mkdir -p /mnt/nas/backups/kvm-01-rebuild/vm-definitions
# Export all VM XML definitions
for vm in $(virsh list --all --name); do
  echo "Exporting: $vm"
  virsh dumpxml "$vm" > "/mnt/nas/backups/kvm-01-rebuild/vm-definitions/${vm}.xml"
done
# List exported VMs
ls -la /mnt/nas/backups/kvm-01-rebuild/vm-definitions/
Table 2. Expected VMs (2026-03-07)
VM Notes

9800-WLC-01

Wireless LAN Controller

bind-01

Primary DNS

vault-01

Vault PKI + SSH CA

home-dc01

Windows AD DC

ipa-01

FreeIPA

ipsk-manager

iPSK Manager

ise-01

ISE Primary

k3s-master-01

Kubernetes control plane

keycloak-01

Identity Provider

pfSense-FW01

Firewall (LEGACY - replaced by VyOS)

7.2. 2.2 Document VM Disk Locations

# List all VM disks (to verify they're on NVMe, not root)
for vm in $(sudo virsh list --all --name); do
  echo "=== $vm ==="
  sudo virsh domblklist "$vm" | grep -v "^$\|Target"
done | tee /mnt/nas/backups/kvm-01-rebuild/vm-disks.txt
# Verify disk paths are NOT on root partition
grep -E "^/" /mnt/nas/backups/kvm-01-rebuild/vm-disks.txt | awk '{print $2}' | sort -u

Expected paths: /mnt/onboard-ssd/libvirt/images/ or /var/lib/libvirt/images/ on NVMe

NOT: / or /root/ - these would be lost during rebuild!

7.3. 2.3 Document Network Interfaces

# Current physical NICs (need to verify names after rebuild)
ip -br link show | grep -v "^lo\|^vnet\|^virbr" | tee /mnt/nas/backups/kvm-01-rebuild/physical-nics.txt

7.4. 2.4 Shutdown All VMs

# Graceful shutdown
for vm in $(sudo virsh list --name); do
  echo "Shutting down: $vm"
  sudo virsh shutdown "$vm"
done
# Wait and verify all stopped
sleep 30
sudo virsh list --all

8. Phase 3: Install Rocky Linux

8.1. 3.1 Boot from Rocky ISO

  1. Download Rocky Linux 9.x minimal ISO

  2. Boot kvm-01 from ISO (via IPMI virtual media or USB)

  3. Select installation destination: 64GB SuperDOM (NOT the NVMe!)

CRITICAL: Select the correct disk!

  • SuperDOM: ~64GB (for OS)

  • NVMe: ~2TB (contains VMs - DO NOT FORMAT)

If unsure, check disk sizes in installer.

8.2. 3.2 Installation Options

Option Value

Software Selection

Minimal Install + Virtualization Host

Root Password

Set strong password

User

Create admin user (evanusmodestus)

Network

Configure DHCP initially (fix later)

Hostname

kvm-01.inside.domusdigitalis.dev

8.3. 3.3 Post-Install Base Config

# After reboot, SSH via DHCP address or console
# Install essential packages
sudo dnf install -y epel-release
sudo dnf install -y qemu-kvm libvirt virt-install bridge-utils vim tmux htop
# Enable and start libvirtd
sudo systemctl enable --now libvirtd

9. Phase 4: Apply Network Configuration

9.1. 4.1 Identify Physical NICs

# List physical interfaces (eno7=WAN, eno8=LAN trunk)
ip -br link show | grep -v "^lo\|^vnet\|^virbr"

Expected: eno7 and eno8 both UP. Architecture matches kvm-02:

Interface Purpose Network

eno7

br-wan (WAN/outside)

192.168.1.x to ISP

eno8

br-mgmt (LAN trunk)

10.50.x.x to switch Te1/0/2

9.2. 4.2 Create br-mgmt Bridge

# Create the bridge
nmcli connection add type bridge con-name br-mgmt ifname br-mgmt
# Configure bridge settings (gateway is vyos-02 at 10.50.1.3)
nmcli connection modify br-mgmt ipv4.addresses 10.50.1.110/24
nmcli connection modify br-mgmt ipv4.gateway 10.50.1.3
nmcli connection modify br-mgmt ipv4.dns "10.50.1.3,10.50.1.90"
nmcli connection modify br-mgmt ipv4.method manual
nmcli connection modify br-mgmt connection.autoconnect yes
nmcli connection modify br-mgmt bridge.stp no bridge.forward-delay 0

9.3. 4.3 Add eno8 as Bridge Slave

# Add eno8 to bridge
nmcli connection add type bridge-slave con-name br-mgmt-port ifname eno8 master br-mgmt

9.4. 4.4 Activate Bridge (Connection Will Drop)

Before activating, delete the standalone eno8 connection (it holds the interface):

nmcli connection delete eno8
# Activate bridge (SSH will drop momentarily)
nmcli connection up br-mgmt

9.5. 4.5 Verify Network

# Reconnect via same IP (now on br-mgmt via eno8)
ssh root@10.50.1.110
# Verify bridge state - should show UP
ip -br addr show br-mgmt
# Verify eno8 is bridge slave
bridge link show | grep -E "eno8.*master"
# Verify connectivity to vyos-02 gateway
ping -c 2 10.50.1.3 && echo "Gateway OK"

9.6. 4.6 Create br-wan Bridge (WAN to ISP)

br-wan provides WAN connectivity for vyos-01. Unlike br-mgmt (which uses VLAN filtering), br-wan is a simple L2 bridge to the ISP modem.

# Create br-wan bridge (no IP - VyOS handles WAN addressing)
sudo nmcli connection add type bridge con-name br-wan ifname br-wan
# Disable IPv4/IPv6 on bridge itself (VyOS gets the IP via DHCP)
sudo nmcli connection modify br-wan ipv4.method disabled
sudo nmcli connection modify br-wan ipv6.method disabled
# Disable STP (direct connection to ISP modem)
sudo nmcli connection modify br-wan bridge.stp no bridge.forward-delay 0

9.7. 4.7 Add eno7 as br-wan Slave

# Add eno7 to br-wan
sudo nmcli connection add type bridge-slave con-name br-wan-port ifname eno7 master br-wan
# Delete standalone eno7 connection if it exists
sudo nmcli connection delete eno7 2>/dev/null || true
# Activate br-wan
sudo nmcli connection up br-wan

9.8. 4.8 Verify br-wan

# Verify bridge exists and eno7 is slave
ip link show br-wan
bridge link show | grep eno7

Expected output:

br-wan: <BROADCAST,MULTICAST,UP,LOWER_UP> ...
5: eno7: ... master br-wan state forwarding ...

10. Phase 5: Configure VLAN Filtering

Enabling vlan_filtering resets VLAN state and drops connectivity immediately. Run ALL commands in this section as a single block via IPMI/console, or combine into one command.

10.1. 5.1 Enable VLAN Filtering and Configure VLANs (Run All At Once)

# CRITICAL: Run this ENTIRE block at once - do not stop partway
ip link set br-mgmt type bridge vlan_filtering 1

# Remove default VLAN 1 (causes conflicts)
bridge vlan del vid 1 dev eno8
bridge vlan del vid 1 dev br-mgmt self

# Add tagged VLANs (NOT 100 - that's added separately with PVID)
for vid in 10 20 30 40 110 120; do
  bridge vlan add vid $vid dev eno8
  bridge vlan add vid $vid dev br-mgmt self
done

# Add VLAN 100 with PVID untagged on BOTH (native VLAN for management)
bridge vlan add vid 100 dev eno8 pvid untagged
bridge vlan add vid 100 dev br-mgmt self pvid untagged

10.2. 5.2 Verify VLAN Configuration

# Test connectivity first
ping -c 1 10.50.1.3 && echo "Gateway OK"
# Verify matches kvm-02 config
bridge vlan show

Expected output (eno8 and br-mgmt sections):

port              vlan-id
eno8              10
                  20
                  30
                  40
                  100 PVID Egress Untagged
                  110
                  120
br-mgmt           10
                  20
                  30
                  40
                  100 PVID Egress Untagged
                  110
                  120

Key differences from default:

  • NO VLAN 1 on eno8 or br-mgmt

  • PVID 100 on BOTH eno8 AND br-mgmt (not just eno8)

  • "Egress Untagged" means untagged traffic goes to VLAN 100

10.3. 5.3 System Parity (hostname, user, NAS)

10.3.1. 5.3.1 Set Hostname

# Set hostname to match kvm-02 naming convention
hostnamectl set-hostname kvm-01.inside.domusdigitalis.dev
# Verify
hostnamectl --static

Expected: kvm-01.inside.domusdigitalis.dev

10.3.2. 5.3.2 Configure User Account (as root)

The evanusmodestus user needs parity with kvm-02 configuration.

# If user doesn't exist (check first with: id evanusmodestus)
useradd -m -G wheel evanusmodestus
passwd evanusmodestus
# Add to libvirt group (required for VM management)
usermod -aG libvirt evanusmodestus
# Verify groups match kvm-02
groups evanusmodestus

Expected: evanusmodestus : evanusmodestus wheel libvirt

10.3.3. 5.3.3 Configure SSH Authorized Keys (as root)

# Create .ssh directory
mkdir -p /home/evanusmodestus/.ssh
chmod 700 /home/evanusmodestus/.ssh
# Add authorized keys (Vault-signed + YubiKeys + ed25519)
cat > /home/evanusmodestus/.ssh/authorized_keys << 'EOF'
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIrgE9z8gkQVRVkkdbc1ejdth7vJkqpY35FrIUv8L6JB vault-signed-20260219
sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5QG9wZW5zc2guY29tAAAAIG/EGu00HuV3jnisul7DUBuk9jLtrE3yR4BZCwGb2YpCAAAABHNzaDo= d000-nano-35641207
sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5QG9wZW5zc2guY29tAAAAIFHfsGSAFAkqwYj6EGS9sA2MROjs28zM6LJds3gagsCkAAAACHNzaDpkMDAw evanusmodestus@d000-yubikey
sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1lZDI1NTE5QG9wZW5zc2guY29tAAAAIEBZ+kus4aTHzQt1zNnEnGxJs+Lf56vrCdcyvqLhpp9hAAAACHNzaDpkMDAw ssh:d000
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIL3vaIABqHOwy88p/5GcX3ZNU044GAz/3T5dH8GIU7DS evanusmodestus@d000
EOF
# Set permissions
chmod 600 /home/evanusmodestus/.ssh/authorized_keys
chown -R evanusmodestus:evanusmodestus /home/evanusmodestus/.ssh

10.3.4. 5.3.4 Configure Vault SSH CA Trust (as root)

# Download Vault SSH CA public key
curl -sk https://vault-01.inside.domusdigitalis.dev:8200/v1/ssh/public_key > /etc/ssh/vault-ca.pub
# Verify CA key downloaded
cat /etc/ssh/vault-ca.pub
# Add to sshd_config
echo "TrustedUserCAKeys /etc/ssh/vault-ca.pub" >> /etc/ssh/sshd_config
# Restart sshd
systemctl restart sshd

10.3.5. 5.3.5 Exit Root Shell

Stop running as root! Use the evanusmodestus user for all operations.

# Exit root shell
exit
# Reconnect as regular user (Vault cert or YubiKey)
ssh evanusmodestus@10.50.1.110

10.3.6. 5.3.6 Mount NAS Shares

kvm-02 has 3 NFS mounts. Create the same on kvm-01:

# Create mount points
sudo mkdir -p /mnt/nas/{vms,isos,backups}
# Mount NFS shares
sudo mount -t nfs nas-01:/volume1/vms /mnt/nas/vms
sudo mount -t nfs nas-01:/volume1/isos /mnt/nas/isos
sudo mount -t nfs nas-01:/volume1/backups /mnt/nas/backups
# Verify mounts
mount | grep nfs | awk '{print $1, $3}'

Expected:

nas-01:/volume1/vms /mnt/nas/vms
nas-01:/volume1/isos /mnt/nas/isos
nas-01:/volume1/backups /mnt/nas/backups

10.3.7. 5.3.7 Add to fstab

# Add persistent mounts (matching kvm-02 fstab)
cat << 'EOF' | sudo tee -a /etc/fstab
nas-01:/volume1/vms      /mnt/nas/vms     nfs  defaults,_netdev  0 0
nas-01:/volume1/isos     /mnt/nas/isos    nfs  defaults,_netdev  0 0
nas-01:/volume1/backups  /mnt/nas/backups nfs  defaults,_netdev  0 0
EOF
# Verify fstab
grep nfs /etc/fstab
# Reload systemd to pick up fstab changes
sudo systemctl daemon-reload

10.3.8. 5.3.8 Verify kvm-01 Backup Files on NAS

# Check backup directory exists
ls -la /mnt/nas/backups/kvm-01-rebuild/

Expected: libvirt-hook.sh, vm-definitions/, bridge configs

10.4. 5.4 Install Libvirt Hook

# Create hooks directory (doesn't exist on fresh install)
sudo mkdir -p /etc/libvirt/hooks
# Copy the hook from NAS
sudo cp /mnt/nas/backups/kvm-01-rebuild/libvirt-hook.sh /etc/libvirt/hooks/qemu
sudo chmod +x /etc/libvirt/hooks/qemu

10.4.1. 5.4.1 Production Libvirt VLAN Hook

Bug in naive approach: Enumerating all vnets on br-mgmt with ip link show master br-mgmt | awk '/vnet/' reconfigures every running VM’s interfaces — race condition during simultaneous starts (host reboot with autostart).

Fix: Match VM’s MAC addresses from XML to vnet peer MACs via sysfs. Scoped to this VM only.

sudo tee /etc/libvirt/hooks/qemu << 'EOF'
#!/bin/bash
# /etc/libvirt/hooks/qemu
# Configures 802.1Q VLANs and PVID on br-mgmt vnet interfaces at VM start
# NOTE: Never use virsh here — deadlocks libvirtd

GUEST_NAME="$1"
OPERATION="$2"

BRIDGE="br-mgmt"
VLANS="10 20 30 40 100 110 120"
VNET_WAIT_SECS=15   # max seconds to wait for vnets to attach

PVID100_VMS="vyos-01 vyos-02
9800-WLC-01 9800-WLC-02
ise-01 ise-02
bind-01 bind-02
home-dc01 home-dc02
keycloak-01 keycloak-02
ipsk-manager ipsk-manager-01 ipsk-manager-02
vault-01 vault-02 vault-03
ipa-01 ipa-02
k3s-master-01 k3s-master-02 k3s-master-03
k3s-worker-01 k3s-worker-02 k3s-worker-03"

log()  { logger -t "libvirt-hook[${GUEST_NAME}]" "$*"; }
warn() { logger -t "libvirt-hook[${GUEST_NAME}]" "WARN: $*"; }
err()  { logger -t "libvirt-hook[${GUEST_NAME}]" "ERROR: $*"; }

needs_pvid100() {
    echo "$PVID100_VMS" | tr ' \n' '\n' | grep -qx "$1"
}

get_vm_vnets() {
    local guest="$1"
    local xml="/etc/libvirt/qemu/${guest}.xml"

    if [[ ! -f "$xml" ]]; then
        err "VM XML not found: $xml"
        return 1
    fi

    local macs
    macs=$(grep -oP "(?<=<mac address=[\"'])[0-9a-f:]+" "$xml")

    if [[ -z "$macs" ]]; then
        warn "No MAC addresses found in $xml"
        return 1
    fi

    local found=0
    for mac in $macs; do
        local suffix="${mac:3}"
        for vnet in $(ip link show master "$BRIDGE" 2>/dev/null \
                      | awk -F'[ :]+' '/vnet/{print $2}'); do
            local vnet_mac
            vnet_mac=$(cat /sys/class/net/"$vnet"/address 2>/dev/null)
            if [[ "${vnet_mac:3}" == "$suffix" ]]; then
                echo "$vnet"
                (( found++ ))
            fi
        done
    done

    [[ $found -eq 0 ]] && return 1
    return 0
}

configure_vnet() {
    local vnet="$1"
    local pvid="$2"
    local errors=0

    log "Configuring $vnet — bridge: $BRIDGE — VLANs: $VLANS — PVID: $pvid"

    # Verify interface still exists
    if ! ip link show "$vnet" &>/dev/null; then
        err "$vnet: interface disappeared before configuration"
        return 1
    fi

    # Remove default PVID 1
    bridge vlan del vid 1 dev "$vnet" pvid untagged 2>/dev/null

    # Add all tagged VLANs
    for vid in $VLANS; do
        if bridge vlan add vid "$vid" dev "$vnet" 2>/dev/null; then
            log "$vnet: added VLAN $vid"
        else
            warn "$vnet: failed to add VLAN $vid"
            (( errors++ ))
        fi
    done

    # Set PVID — must come after tagged VLANs are added
    if bridge vlan add vid "$pvid" dev "$vnet" pvid untagged 2>/dev/null; then
        log "$vnet: PVID set to $pvid"
    else
        err "$vnet: failed to set PVID $pvid"
        (( errors++ ))
    fi

    # Verify final state (single line for syslog)
    local vlan_state
    vlan_state=$(bridge vlan show dev "$vnet" 2>/dev/null | tr '\n' ' ')
    log "$vnet: final state — $vlan_state"

    return $errors
}

case "$OPERATION" in
    started)
        (
            log "started — waiting for vnets on $BRIDGE"

            # Poll until vnets appear — replaces fragile sleep 3
            vnets=()
            for i in $(seq 1 "$VNET_WAIT_SECS"); do
                mapfile -t vnets < <(get_vm_vnets "$GUEST_NAME")
                [[ ${#vnets[@]} -gt 0 ]] && break
                log "Waiting for vnets... attempt $i/${VNET_WAIT_SECS}"
                sleep 1
            done

            if [[ ${#vnets[@]} -eq 0 ]]; then
                err "No vnets found for $GUEST_NAME on $BRIDGE after ${VNET_WAIT_SECS}s — giving up"
                exit 1
            fi

            log "Found ${#vnets[@]} vnet(s): ${vnets[*]}"

            if needs_pvid100 "$GUEST_NAME"; then
                pvid=100
            else
                pvid=1
                warn "$GUEST_NAME not in PVID100 list — using PVID 1, verify this is correct"
            fi

            total_errors=0
            for vnet in "${vnets[@]}"; do
                configure_vnet "$vnet" "$pvid"
                (( total_errors += $? ))
            done

            if [[ $total_errors -eq 0 ]]; then
                log "Complete — ${#vnets[@]} interface(s) configured successfully"
            else
                err "Complete with $total_errors error(s) — check VLAN state manually"
                exit 1
            fi
        ) &
        ;;
    begin)
        log "begin — no action"
        ;;
    stopped|reconnect)
        log "$OPERATION — no action (bridge cleans up vnets automatically)"
        ;;
    *)
        log "unhandled operation: $OPERATION"
        ;;
esac

exit 0
EOF
sudo chmod 755 /etc/libvirt/hooks/qemu
sudo systemctl restart libvirtd
Table 3. Hook Design
Feature Purpose

BRIDGE, VNET_WAIT_SECS

Parameterized — easy to adjust without editing logic

log(), warn(), err()

Consistent severity levels for journalctl filtering

get_vm_vnets()

MAC-to-vnet matching via sysfs — scoped to this VM only

mapfile -t vnets

Proper array population, no word-splitting bugs

Interface existence check

Guards against race where vnet disappears mid-config

Error counting

$total_errors gives clear pass/fail in logs

begin, * cases

Explicit handlers for all libvirt operations

When adding new VMs, update PVID100_VMS in both kvm-01 and kvm-02 hooks.

10.4.2. 5.4.2 Verify Hook Operation

# Watch hook logs during VM start
journalctl -t "libvirt-hook[k3s-master-01]" -f
# Test by restarting a VM
sudo virsh shutdown k3s-master-01 && sleep 5 && sudo virsh start k3s-master-01
# Verify correct VLAN assignment
VNET=$(grep -oP "(?<=<mac address=[\"'])[0-9a-f:]+" /etc/libvirt/qemu/k3s-master-01.xml | head -1)
echo "VM MAC: $VNET"
bridge vlan show | grep -A10 vnet
Expected output
vnet0             10
                  20
                  30
                  40
                  100 PVID Egress Untagged
                  110
                  120

10.4.3. 5.4.3 Manual VLAN Fix (Emergency)

If a VM was started before hook installation or the hook failed:

# Get vnet interface for the VM
VNET=$(sudo virsh domiflist <vm-name> | awk '/br-mgmt/{print $1}')
echo "vnet: $VNET"
# Add PVID 100 and all VLANs, remove VLAN 1
sudo bridge vlan add vid 100 dev $VNET pvid untagged
sudo bridge vlan add vid 10 dev $VNET
sudo bridge vlan add vid 20 dev $VNET
sudo bridge vlan add vid 30 dev $VNET
sudo bridge vlan add vid 40 dev $VNET
sudo bridge vlan add vid 110 dev $VNET
sudo bridge vlan add vid 120 dev $VNET
sudo bridge vlan del vid 1 dev $VNET
bridge vlan show dev $VNET

11. Phase 6: Import VMs

11.1. 6.1 Mount VM Storage

# Check disk layout - VM storage is the ~978GB disk
lsblk

On kvm-01, the SSD shows as sdb1 (not nvme0n1):

sdb                    8:16   0 978.1G  0 disk
└─sdb1                 8:17   0 978.1G  0 part   <-- This is VM storage
# Create mount point and mount
sudo mkdir -p /mnt/onboard-ssd
sudo mount /dev/sdb1 /mnt/onboard-ssd
# Verify VM images exist
ls /mnt/onboard-ssd/
# Add to fstab for persistence
echo "/dev/sdb1 /mnt/onboard-ssd xfs defaults 0 0" | sudo tee -a /etc/fstab
# Reload systemd to pick up fstab changes
sudo systemctl daemon-reload

11.2. 6.2 Import VM Definitions

# Import all VM definitions
for xml in /mnt/nas/backups/kvm-01-rebuild/vm-definitions/*.xml; do
  vm=$(basename "$xml" .xml)
  echo "Importing: $vm"
  sudo virsh define "$xml"
done
# List imported VMs
sudo virsh list --all

11.3. 6.3 Update VM Network (virbr0 → br-mgmt)

VMs from old kvm-01 used virbr0. They need to be updated to use br-mgmt.

# For each VM, update network source
for vm in $(sudo virsh list --all --name); do
  echo "Updating network for: $vm"
  sudo virt-xml "$vm" --edit --network bridge=br-mgmt
done

11.4. 6.4 Start VMs

# Start critical VMs first
for vm in vyos-01 home-dc01 vault-01; do
  echo "Starting: $vm"
  sudo virsh start "$vm"
  sleep 10
done
# Verify VMs have network (check PVID assignment)
for vm in $(sudo virsh list --name); do
  vnet=$(sudo virsh domiflist "$vm" | awk '/br-mgmt/{print $1}')
  echo "=== $vm ($vnet) ==="
  bridge vlan show dev "$vnet" 2>/dev/null | grep -E "100.*PVID|port"
done

12. Phase 7: Validation

12.1. 7.1 Verify vyos-01 VRRP

# SSH to vyos-01 and check VRRP state
ssh vyos-01 "show vrrp"

Expected: vyos-01 = MASTER (priority 200), vyos-02 = BACKUP (priority 100)

12.2. 7.2 Verify VMs

# Ping test all VMs
for ip in 10.50.1.3 10.50.1.50 10.50.1.60 10.50.1.20; do
  ping -c 1 -W 2 $ip && echo "$ip: OK" || echo "$ip: FAIL"
done

12.3. 7.3 Redistribute VMs for HA

After validation, move appropriate VMs back to kvm-01:

VM Target Host Notes

vyos-01

kvm-01

VRRP Master

ise-01

kvm-01

Primary PAN

vault-01

kvm-01

Raft Leader

home-dc01

kvm-01

Primary DC

bind-01

kvm-01

Primary DNS

k3s-master-01

kvm-01

Control plane

WLC-01

kvm-01

SSO Active

13. Quick Reference

13.1. Config Files to Copy from kvm-02

File Purpose

/etc/libvirt/hooks/qemu

VLAN hook for VMs

nmcli connection show br-mgmt

Bridge config

/etc/sysctl.d/99-bridge.conf

Bridge netfilter settings (if exists)

13.2. Key Commands

# Check bridge VLAN state
bridge vlan show

# Check VM network assignment
sudo virsh domiflist <vm-name>

# Monitor libvirt hook
journalctl -t libvirt-hook -f

# Force PVID 100 on a vnet
VNET=vnetX
sudo bridge vlan del vid 1 dev $VNET pvid untagged
sudo bridge vlan add vid 100 dev $VNET pvid untagged

14.1. Master Runbook

14.2. KVM/Hypervisor

14.3. VyOS Router

14.4. Security

14.5. Backup/Recovery