kvm-01 Migration Planning

Overview

This runbook documents kvm-01’s current state and plans the migration of VMs to kvm-02.

HA-FIRST STRATEGY: Deploy HA infrastructure on kvm-02 BEFORE migrating VMs from kvm-01. This ensures true high availability across hypervisors, not just moving single points of failure.

HA Deployment Prerequisites

Complete these phases before VM migration:

Phase Task Status Runbook

0

NAS NFS permissions for kvm-02 (10.50.1.111)

[ ] Pending

NAS Share Management

1

Vault HA: vault-01 file→raft, deploy vault-02/03 on kvm-02

[ ] Pending

Vault HA Deployment

2

DNS HA: deploy bind-02 on kvm-02

[ ] Pending

BIND-02 Deployment

3

Non-critical VM migration (ipsk-manager, keycloak-01)

[ ] Pending

kvm-01 Migration

4 (Future)

Critical infrastructure HA (home-dc02, ise-02, vyos-02 VRRP)

[ ] Planned

TBD


Current Network Topology (Actual from captured output)

Interface IP Purpose

eno1

192.168.1.225/24

OOB from AT&T modem (DHCP) - backup management

virbr0

10.50.1.99/24

VM bridge (infrastructure VMs live here)

virbr1

192.168.100.1/24

Lab bridge (DOWN, unused)

eno8np3

(no IP, bridge member)

10GbE uplink to switch, member of virbr0

The Dual-Path Issue

  • 192.168.1.225 (eno1) - Direct from modem, always reachable from modem subnet

  • 10.50.1.99 (virbr0) - Infrastructure network, reachable when pfSense routes it

The Routing Problem (Root Cause)

Default route goes to modem (192.168.1.1), NOT pfSense (10.50.1.1):

default via 192.168.1.1 dev eno1 proto static metric 20101

This means:

  • kvm-01 host internet traffic → modem → FAILS (modem does IP passthrough to pfSense, not kvm-01)

  • VMs on virbr0 → pfSense → internet → WORKS

  • kvm-01 to 10.50.1.x → virbr0 → WORKS (local to bridge)

Phase 0: IPMI Configuration

Out-of-band management via IPMI. Required for emergency recovery when network/SSH fails.

0.1 Verify IPMI Settings

# Check current IPMI network config
sudo ipmitool lan print 1 | grep -E "IP Address|MAC Address|Subnet|Gateway"
Expected for ipmi-01
IP Address Source       : Static Address
IP Address              : {ipmi-ip}
Subnet Mask             : {netmask-24}
MAC Address             : 3c:ec:ef:43:50:42
Default Gateway IP      : {pfsense-ip}

0.2 Set Static IP (if needed)

# Set static IP for IPMI
sudo ipmitool lan set 1 ipsrc static
sudo ipmitool lan set 1 ipaddr 10.50.1.200
sudo ipmitool lan set 1 netmask 255.255.255.0
sudo ipmitool lan set 1 defgw ipaddr 10.50.1.1

0.3 Verify IPMI LAN Mode (Dedicated)

Supermicro BMC supports three LAN modes. Dedicated mode is required for the separate IPMI port.

# Check current LAN mode
sudo ipmitool raw 0x30 0x70 0x0c 0
Table 1. LAN Mode Values
Value Mode Description

00

Dedicated (Required)

Uses dedicated IPMI port only

01

Shared

Shares with onboard NIC1

02

Failover

Tries dedicated, falls back to shared

If mode is NOT 00, set to dedicated:

# Set LAN mode to Dedicated (0x00)
sudo ipmitool raw 0x30 0x70 0x0c 1 0
# Reset BMC to apply all changes
sudo ipmitool mc reset cold

If IPMI shows wrong MAC on switch (e.g., matches eno2 instead of dedicated port), the BMC is likely in Failover or Shared mode. Set to Dedicated mode and reset BMC.

0.4 Verify from Workstation

# Verify IPMI is reachable
ping -c 2 10.50.1.200

# Verify correct MAC on switch
# Expected: 3cec.ef43.5042 (IPMI dedicated port)
# If you see 3cec.ef43.4d49, that's eno2 - wrong port!

Phase 1: Current State Validation

1.1 Network Interfaces

# Show all interfaces with IPs
ip -4 -br addr show | grep -v '^lo'
output 2026-03-01 20:58
virbr0           UP             10.50.1.99/24
eno1             UP             192.168.1.225/24
virbr1           DOWN           192.168.100.1/24
# Show routing table
ip route | awk 'NR<=10 {print NR": "$0}'
output 2026-03-01 20:59
1: default via 192.168.1.1 dev eno1 proto static metric 20101
2: 10.50.1.0/24 dev virbr0 proto kernel scope link src 10.50.1.99
3: 192.168.1.0/24 dev eno1 proto kernel scope link src 192.168.1.225 metric 101
4: 192.168.100.0/24 dev virbr1 proto kernel scope link src 192.168.100.1 linkdown
# Check default gateways
ip route | grep default
output
evanusmodestus@supermicro300-9d1:~$ # Check default gateways
ip route | grep default
default via 192.168.1.1 dev eno1 proto static metric 20101

1.2 Bridge Configuration

# List bridges and members
bridge link show
output
3: eno5np0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 hwmode VEPA
5: eno6np1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 hwmode VEPA
10: eno8np3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
10: eno8np3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 hwmode VEPA
15: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
16: vnet1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
17: vnet2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
19: vnet4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
20: vnet5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
21: vnet6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
24: vnet9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
34: vnet19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
41: vnet26: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
54: vnet39: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
61: vnet46: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
75: vnet60: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master virbr0 state forwarding priority 32 cost 2
# Check virbr0 specifically
ip -d link show virbr0
output
2: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 3a:1e:7c:ca:b9:ed brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535 netns-immutable
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.3a:1e:7c:ca:b9:ed designated_root 8000.3a:1e:7c:ca:b9:ed root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer    0.00 tcn_timer    0.00 topology_change_timer    0.00 gc_timer  154.49 fdb_n_learned 18 fdb_max_learned 0 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask 0 group_address 01:80:c2:00:00:00 mcast_snooping 1 no_linklocal_learn 0 mcast_vlan_snooping 0 mst_enabled 0 mdb_offload_fail_notification 0 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3125 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536

1.3 VM Inventory

# List all VMs with state
sudo virsh list --all
# Show VM resource allocation
sudo virsh list --all | awk 'NR>2 && NF {print $2}' | while read vm; do
  echo "=== $vm ==="
  sudo virsh dominfo "$vm" | grep -E "CPU|Memory"
done
output (certmgr-01 was renamed to vault-01)
 Id   Name            State
-------------------------------
 1    pfSense-FW01    running
 2    vault-01        running
 4    9800-CL-WLC     running
 5    ipsk-manager    running
 8    keycloak-01     running
 18   home-dc01       running
 25   ise-01          running
 39   bind-01         running
 47   ipa-01          running
 64   k3s-master-01   running

1.4 Storage Pools

# List storage pools
sudo virsh pool-list --all
output
 Name          State    Autostart
-----------------------------------
 images        active   yes
 images-1      active   yes
 iso           active   yes
 isos          active   yes
 nas-isos      active   yes
 nas-vms       active   yes
 nvram         active   yes
 onboard-ssd   active   yes
 tmp           active   yes
 virtio-win    active   yes
 vms           active   yes
# Show pool details
sudo virsh pool-info onboard-ssd
output
evanusmodestus@supermicro300-9d1:~$ sudo virsh pool-info onboard-ssd
Name:           onboard-ssd
UUID:           373797e2-e00f-4372-8bba-5c15f70c1eaa
State:          running
Persistent:     yes
Autostart:      yes
Capacity:       961.66 GiB
Allocation:     360.06 GiB
Available:      601.61 GiB

1.5 Network Verification

# Test connectivity to gateway
ping -c 2 10.50.1.1

.output

PING 10.50.1.1 (10.50.1.1) 56(84) bytes of data. 64 bytes from 10.50.1.1: icmp_seq=1 ttl=64 time=0.199 ms 64 bytes from 10.50.1.1: icmp_seq=2 ttl=64 time=0.202 ms

--- 10.50.1.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1015ms rtt min/avg/max/mdev = 0.199/0.200/0.202/0.001 ms

# Test connectivity to DNS
ping -c 2 10.50.1.90

.output

evanusmodestus@supermicro300-9d1:~$ ping -c 2 10.50.1.90 PING 10.50.1.90 (10.50.1.90) 56(84) bytes of data. 64 bytes from 10.50.1.90: icmp_seq=1 ttl=64 time=0.502 ms 64 bytes from 10.50.1.90: icmp_seq=2 ttl=64 time=0.173 ms

--- 10.50.1.90 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1021ms rtt min/avg/max/mdev = 0.173/0.337/0.502/0.164 ms

# Test internet via pfSense
ping -c 2 8.8.8.8
output
evanusmodestus@supermicro300-9d1:~$ ping -c 2 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
From 192.168.1.225 icmp_seq=1 Destination Host Unreachable
From 192.168.1.225 icmp_seq=2 Destination Host Unreachable

--- 8.8.8.8 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1020ms
pipe 2

Phase 2: Migration Strategy

2.1 VM Migration Order

Priority order for migrating VMs to kvm-02:

  1. Non-critical VMs first - Test migration process

    • ipsk-manager

    • keycloak-01

  2. Secondary services

    • bind-01 (DNS - have bind-02 ready)

    • ipa-01 (FreeIPA)

  3. Critical infrastructure last

    • vault-01 (PKI/Secrets)

    • home-dc01 (AD DS)

    • ise-01 (NAC)

    • pfSense-FW01 (Firewall - LAST)

2.2 Pre-Migration Checklist (Programmatic)

2.2.1 Pre-Flight Validation

# Validate kvm-02 health (must be 22/22)
ssh kvm-02 "/usr/local/bin/kvm-health-check" | awk '/Results:/'
# List VMs on both hosts
netapi kvm list -H kvm-01
netapi kvm list -H kvm-02
# Check storage pools on kvm-02
ssh kvm-02 "sudo virsh pool-list --all"
# Check NAS connectivity from kvm-02
ssh kvm-02 "ping -c 2 10.50.1.70"
# Verify IPMI access to both hosts
ipmitool -I lanplus -H 10.50.1.200 -U ADMIN chassis status | head -3
ipmitool -I lanplus -H 10.50.1.201 -U ADMIN chassis status | head -3

2.2.2 NAS NFS Permissions for kvm-02

# Check current NFS shares on Synology
netapi synology shares
# TODO: Add kvm-02 (10.50.1.111) to NFS allowed hosts
# This requires Synology DSM UI or API call
# Path: Control Panel → Shared Folder → Edit → NFS Permissions
# Add: 10.50.1.111 with read/write, no_root_squash

2.2.3 Backup VM Definitions

# Backup all VM XML definitions from kvm-01
netapi kvm backup -H kvm-01 --all --dest /tmp/kvm01-backup
# Or manually per VM
ssh kvm-01 "sudo virsh dumpxml ipsk-manager" > /tmp/ipsk-manager.xml
ssh kvm-01 "sudo virsh dumpxml keycloak-01" > /tmp/keycloak-01.xml

2.3 VM Migration Procedure

2.3.1 Migration Workflow (Per VM)

# Variables - set per VM
VM_NAME="ipsk-manager"
SRC_HOST="kvm-01"
DST_HOST="kvm-02"
SRC_POOL="/var/lib/libvirt/images"
DST_POOL="/var/lib/libvirt/images"
# Step 1: Get VM info and disk path
netapi kvm info -H ${SRC_HOST} ${VM_NAME}
ssh ${SRC_HOST} "sudo virsh domblklist ${VM_NAME}"
# Step 2: Graceful shutdown on source
netapi kvm stop -H ${SRC_HOST} ${VM_NAME}
# Or: ssh ${SRC_HOST} "sudo virsh shutdown ${VM_NAME}"
# Step 3: Verify VM is shut off
ssh ${SRC_HOST} "sudo virsh domstate ${VM_NAME}"
# Expected: shut off
# Step 4: Copy disk image to kvm-02
# Option A: Direct SCP (slower but works)
ssh ${SRC_HOST} "sudo cat ${SRC_POOL}/${VM_NAME}.qcow2" | \
  ssh ${DST_HOST} "sudo tee ${DST_POOL}/${VM_NAME}.qcow2 > /dev/null"

# Option B: Via NAS (if both can mount)
# ssh ${SRC_HOST} "sudo cp ${SRC_POOL}/${VM_NAME}.qcow2 /mnt/nas-vms/"
# ssh ${DST_HOST} "sudo cp /mnt/nas-vms/${VM_NAME}.qcow2 ${DST_POOL}/"
# Step 5: Export and modify VM XML
ssh ${SRC_HOST} "sudo virsh dumpxml ${VM_NAME}" > /tmp/${VM_NAME}.xml

# Edit XML: change source path and network bridge
sed -i "s|${SRC_POOL}|${DST_POOL}|g" /tmp/${VM_NAME}.xml
sed -i "s|virbr0|br-mgmt|g" /tmp/${VM_NAME}.xml
# Step 6: Import VM on kvm-02
scp /tmp/${VM_NAME}.xml ${DST_HOST}:/tmp/
ssh ${DST_HOST} "sudo virsh define /tmp/${VM_NAME}.xml"
# Step 7: Start VM on kvm-02
netapi kvm start -H ${DST_HOST} ${VM_NAME}
# Or: ssh ${DST_HOST} "sudo virsh start ${VM_NAME}"
# Step 8: Verify VM is running and accessible
ssh ${DST_HOST} "sudo virsh domstate ${VM_NAME}"
ping -c 3 <VM_IP>

2.3.2 Rollback Procedure

If migration fails:

# Stop failed VM on kvm-02
netapi kvm stop -H kvm-02 ${VM_NAME} --force

# Undefine on kvm-02
ssh kvm-02 "sudo virsh undefine ${VM_NAME}"

# Start original on kvm-01
netapi kvm start -H kvm-01 ${VM_NAME}

# Verify
netapi kvm info -H kvm-01 ${VM_NAME}

2.3.3 Post-Migration Cleanup

After VM verified working on kvm-02 for 24-48 hours:

# Remove old VM definition from kvm-01 (keeps disk as backup)
ssh kvm-01 "sudo virsh undefine ${VM_NAME}"

# Optional: Archive old disk image
ssh kvm-01 "sudo mv ${SRC_POOL}/${VM_NAME}.qcow2 ${SRC_POOL}/archived/"

2.4 Migration Status Tracking

Priority VM Source Target Status / Notes

1

ipsk-manager

kvm-01

kvm-02

[ ] Pending (after HA Phase 0-2)

2

keycloak-01

kvm-01

kvm-02

[ ] Pending (after HA Phase 0-2)

3

bind-01

kvm-01

STAY

[ ] Keep on kvm-01 (bind-02 on kvm-02 = HA)

4

ipa-01

kvm-01

kvm-02

[ ] Pending (consider ipa-02 for HA)

5

vault-01

kvm-01

STAY

[ ] Keep on kvm-01 (vault-02/03 on kvm-02 = HA)

6

home-dc01

kvm-01

STAY

[ ] Keep on kvm-01 (home-dc02 on kvm-02 = HA)

7

ise-01

kvm-01

STAY

[ ] Keep on kvm-01 (ise-02 on kvm-02 = HA)

8

9800-CL-WLC

kvm-01

kvm-02

[ ] Pending (single instance OK)

9

k3s-master-01

kvm-01

kvm-02

[ ] Pending (plan k3s HA first)

10

pfSense-FW01

kvm-01

STAY

[ ] Keep on kvm-01 (pfSense-FW02 + CARP = HA)

HA Strategy Change - Instead of migrating single points of failure, deploy secondaries on kvm-02 to achieve true HA across hypervisors. Primary services stay on kvm-01, secondaries on kvm-02.

Phase 3: Network Cleanup (AFTER Migration)

Do NOT change kvm-01 networking until VMs are migrated. Current state is a "blackhole" for host internet only - VMs work fine through pfSense.

3.0 Current State Assessment

Component Status Action

kvm-01 VMs

Working (route through pfSense)

No change needed

kvm-01 host internet

Blackhole (routes to modem, fails)

Fix AFTER migration

kvm-02

Properly configured (routes through pfSense)

Ready for VMs

3.1 Goal: Clean Network Architecture

After migration, kvm-01 network should be:

  • Remove dependency on modem DHCP (192.168.1.x)

  • Single management interface on 10.50.1.x

  • IPMI for out-of-band access (not modem)

  • Both hypervisors route through pfSense (HA ready)

3.2 Current Issue Analysis

# Why can't we ping 10.50.1.99 from outside?
# Check if traffic is going through pfSense
traceroute -n 10.50.1.99
# Check pfSense firewall rules (from pfSense)
# pfctl -sr | grep 10.50.1.99