kvm-02 Deployment: Supermicro E300-9D-8CN8TP with Rocky Linux
Host |
kvm-02.inside.domusdigitalis.dev |
Management IP |
10.50.1.111 |
LAN IP |
192.168.1.182 |
IPMI IP |
10.50.1.201 |
OS |
Rocky Linux 9.x |
Role |
Secondary KVM Hypervisor |
Overview
This runbook deploys kvm-02 as the secondary hypervisor in the home enterprise. It will host:
-
vault-02, vault-03 (Vault HA cluster)
-
k3s-master-02, k3s-master-03 (k3s HA control plane)
-
Additional k3s workers
Related Documentation
-
kvm-01 Migration Plan - Primary hypervisor state and migration
-
VyOS Migration - Overall infrastructure migration project
|
M.2 SSD Installation Note: The 990 EVO Plus 2TB M.2 screw standoff is broken. Temporary fix: electrical tape or foam to prevent card lift. Order replacement M.2 standoff kit. |
Hardware Specifications
| Component | Specification |
|---|---|
Model |
Supermicro E300-9D-8CN8TP (SYS-E300-9D-8CN8TP) |
CPU |
Intel Xeon D-2146NT (8C/16T, 2.3GHz base, 3.0GHz turbo) |
RAM |
64GB ECC DDR4-2666 (4x16GB, expandable to 512GB) |
Storage (OS/VMs) |
Samsung 990 EVO Plus 2TB NVMe M.2 |
Storage (Backups) |
NFS to NAS (10.50.1.70) |
Network |
4x 1GbE (I210) + 4x 10GbE SFP+ (X550) |
IPMI |
Supermicro BMC (AST2500) |
Power |
200W TDP max |
BIOS Settings (Required)
Access BIOS via IPMI console or F2 during POST:
| Setting | Location | Value |
|---|---|---|
Intel VT-x |
Advanced → CPU Configuration |
Enabled |
Intel VT-d |
Advanced → CPU Configuration |
Enabled |
SR-IOV |
Advanced → PCIe/PCI/PnP |
Enabled |
Execute Disable Bit |
Advanced → CPU Configuration |
Enabled |
Hyper-Threading |
Advanced → CPU Configuration |
Enabled |
C-States |
Advanced → CPU Configuration |
Disabled (for latency) |
Boot Mode |
Boot → Boot Mode Select |
UEFI |
Secure Boot |
Security → Secure Boot |
Disabled (for Linux) |
|
VT-x and VT-d are REQUIRED for KVM. Without VT-d, PCI passthrough (GPU, NIC) won’t work. |
Phase 0: IPMI Recovery and Configuration
0.0 IPMI Credentials
Load credentials via dsec (preferred):
# Load network device credentials (includes IPMI)
dsource d000 dev/network
# Verify variables are set
echo "IPMI_USER: $IPMI_USER"
echo "IPMI_PASS is set: $([ -n "$IPMI_PASS" ] && echo yes || echo no)"
Alternative: Direct gopass access
# Load IPMI credentials from gopass directly
IPMI_USER="ADMIN"
IPMI_PASS=$(gopass show -o v3/domains/d000/servers/ipmi/ipmi-02/ADMIN)
0.0.1 Change Default IPMI Password
Supermicro ships with default ADMIN/ADMIN. Change immediately.
# Generate and store password (16 chars = safe for IPMI)
# Structure: ipmi-02/ADMIN allows per-user credentials
gopass generate v3/domains/d000/servers/ipmi/ipmi-02/ADMIN 16
|
IPMI passwords are limited to 20 bytes maximum. Use 16 characters to avoid multi-byte character issues. |
# Add metadata
gopass edit v3/domains/d000/servers/ipmi/ipmi-02/ADMIN
<generated-password>
---
user_id: 2
ip: 10.50.1.201
hostname: ipmi-02.inside.domusdigitalis.dev
hardware: Supermicro E300-9D-8CN8TP
server: kvm-02
description: kvm-02 BMC/IPMI ADMIN account
created: 2026-03-01
# List IPMI users (find user ID for ADMIN)
sudo ipmitool user list 1
# Disable extra users (ID 3 and above)
# Note: Cannot set empty name, but disabled users cannot authenticate
sudo ipmitool user disable 3
# Verify only ADMIN remains active
sudo ipmitool user list 1
# On WORKSTATION: Copy password to clipboard
gopass show -c v3/domains/d000/servers/ipmi/ipmi-02/ADMIN
# Copied to clipboard, clears in 45 seconds
# On SERVER (SSH session): Set password interactively
# Paste from clipboard when prompted
sudo ipmitool user set password 2
# Enter Password: <paste>
# Confirm Password: <paste>
# Verify new password works (paste password when prompted)
sudo ipmitool user test 2 20
# On WORKSTATION: Verify remote IPMI access with new credentials
ipmitool -I lanplus -H 10.50.1.201 -U ADMIN \
-P "$(gopass show -o v3/domains/d000/servers/ipmi/ipmi-02/ADMIN)" power status
# Expected: Chassis Power is on
0.1 Access IPMI via Physical Connection
If IPMI is unreachable, connect monitor/keyboard directly:
# Check if ipmitool is installed
rpm -q ipmitool
# Install if not present
sudo dnf install -y ipmitool
# Verify installation
rpm -q ipmitool && ipmitool -V
# Check if IPMI modules are loaded
lsmod | grep ipmi
# Load IPMI modules if missing
sudo modprobe ipmi_devintf
sudo modprobe ipmi_si
# Verify modules loaded
lsmod | grep ipmi
# Local IPMI commands (no network needed)
sudo ipmitool lan print 1
0.2 Configure IPMI Network
Run on the server locally (physical console or SSH if OS is installed):
# Set static IP for IPMI (no -I lanplus = local)
sudo ipmitool lan set 1 ipsrc static
sudo ipmitool lan set 1 ipaddr 10.50.1.201
sudo ipmitool lan set 1 netmask 255.255.255.0
sudo ipmitool lan set 1 defgw ipaddr 10.50.1.1
# Verify settings before applying
sudo ipmitool lan print 1 | grep -E "IP Address|Subnet Mask|Default Gateway"
# Apply changes (BMC reboots)
sudo ipmitool mc reset cold
|
These commands require physical access or SSH to the server. Once IPMI has an IP, you can manage it remotely with |
0.2.1 Verify IPMI LAN Mode (Dedicated)
Supermicro BMC supports three LAN modes. Dedicated mode is required for the separate IPMI port.
# Check current LAN mode
sudo ipmitool raw 0x30 0x70 0x0c 0
| Value | Mode | Description |
|---|---|---|
|
Dedicated (Required) |
Uses dedicated IPMI port only |
|
Shared |
Shares with onboard NIC1 |
|
Failover |
Tries dedicated, falls back to shared |
If mode is NOT 00, set to dedicated:
# Set LAN mode to Dedicated (0x00)
sudo ipmitool raw 0x30 0x70 0x0c 1 0
# Reset BMC to apply LAN mode change
sudo ipmitool mc reset cold
|
If IPMI shows wrong MAC on switch (e.g., matches eno2 instead of dedicated port), the BMC is likely in Failover or Shared mode. Set to Dedicated mode and reset BMC. |
0.3 Verify IPMI Access
# From workstation, verify IPMI is reachable
ping 10.50.1.201
# Test IPMI commands remotely
dsource d000 dev/network
ipmitool -I lanplus -H 10.50.1.201 -U $IPMI_USER -P $IPMI_PASS power status
0.4 IPMI Quick Reference
# Power operations
ipmitool -I lanplus -H 10.50.1.201 -U $IPMI_USER -P $IPMI_PASS power on
ipmitool -I lanplus -H 10.50.1.201 -U $IPMI_USER -P $IPMI_PASS power off
ipmitool -I lanplus -H 10.50.1.201 -U $IPMI_USER -P $IPMI_PASS power cycle
# SOL (Serial Over LAN) console
ipmitool -I lanplus -H 10.50.1.201 -U $IPMI_USER -P $IPMI_PASS sol activate
# Exit SOL: ~.
# Boot to BIOS
ipmitool -I lanplus -H 10.50.1.201 -U $IPMI_USER -P $IPMI_PASS chassis bootdev bios
# Sensor check
ipmitool -I lanplus -H 10.50.1.201 -U $IPMI_USER -P $IPMI_PASS sdr type temperature
0.5 IPMI SSL Certificate (Optional)
Replace the self-signed IPMI certificate with one from Vault PKI:
# Issue certificate from Vault
vault write -format=json pki_int/issue/domus-server \
common_name="ipmi-02.inside.domusdigitalis.dev" \
ttl="8760h" > /tmp/ipmi-cert.json
# Extract cert and key
jq -r '.data.certificate' /tmp/ipmi-cert.json > /tmp/ipmi-02.crt
jq -r '.data.private_key' /tmp/ipmi-cert.json > /tmp/ipmi-02.key
jq -r '.data.ca_chain[]' /tmp/ipmi-cert.json >> /tmp/ipmi-02.crt
# Upload via IPMI web interface:
# Configuration → SSL Certification → Upload
# Or via ipmitool (Supermicro specific):
ipmitool -I lanplus -H 10.50.1.201 -U $IPMI_USER -P $IPMI_PASS \
raw 0x30 0x9d 0x01 0x00 0x00 < /tmp/ipmi-02.crt
|
Supermicro IPMI cert upload via raw commands is model-specific. Web UI is more reliable. After upload, BMC resets automatically. |
Phase 1: Rocky Linux Installation
1.1 Download Rocky Linux ISO
# On workstation or nas-01
# Rocky 9.7 (current as of 2025-11) - 2.5GB minimal
wget https://download.rockylinux.org/pub/rocky/9/isos/x86_64/Rocky-9.7-x86_64-minimal.iso
# Alternative: "latest" symlink (always points to current minor release)
# wget https://download.rockylinux.org/pub/rocky/9/isos/x86_64/Rocky-9-latest-x86_64-minimal.iso
# Verify checksum
wget https://download.rockylinux.org/pub/rocky/9/isos/x86_64/Rocky-9.7-x86_64-minimal.iso.CHECKSUM
sha256sum -c Rocky-9.7-x86_64-minimal.iso.CHECKSUM 2>&1 | grep -E 'OK|FAILED'
# Transfer to USB or mount via IPMI virtual media
1.2 Create Bootable USB
1.2.1 Identify USB Device
# List block devices BEFORE inserting USB
lsblk -d -o NAME,SIZE,MODEL
# Insert USB drive
# List again - new device is your USB
lsblk -d -o NAME,SIZE,MODEL
# Verify with dmesg (shows most recent device)
dmesg | tail -10
|
Triple-check the device name. |
1.2.2 Write ISO to USB
# Unmount if auto-mounted
sudo umount /dev/sdX* 2>/dev/null
# Write ISO to USB (replace sdX with your device)
sudo dd if=Rocky-9.7-x86_64-minimal.iso of=/dev/sdX bs=4M status=progress oflag=sync
# Sync to ensure all data written
sync
Parameters explained:
| Parameter | Meaning |
|---|---|
|
Input file (the ISO) |
|
Output file (the USB device, NOT a partition) |
|
Block size 4MB (faster than default 512 bytes) |
|
Show transfer progress |
|
Synchronous writes (safer, slightly slower) |
1.2.3 Verify USB
# Flush all buffers (paranoid but safe)
sync
# Check partition table was written (should show MBR/dos with bootable partition)
sudo fdisk -l /dev/sdX | awk 'NR<=12'
# Verify ISO filesystem and label
blkid /dev/sdX | awk -F'"' '{print "LABEL:", $2, "TYPE:", $4}'
# Raw verification: Check MBR boot signature (should show 55aa)
sudo xxd -s 510 -l 2 /dev/sdX
# Advanced: Verify ISO9660 magic string at sector 16 (offset 32768)
sudo xxd -s 32769 -l 5 /dev/sdX | awk '{print $2$3$4$5$6}'
# Should output: 4344303031 (hex for "CD001")
1.3 Installation Configuration
During Rocky Linux installer:
| Setting | Value |
|---|---|
Language |
English (United States) |
Time Zone |
America/Los_Angeles |
Keyboard |
US |
Installation Destination |
Supermicro SSD 118GB (sda) - NOT the NVMe |
Partitioning |
Automatic or Custom (see below) |
Network |
Configure static IP |
Root Password |
Set strong password |
User |
evanusmodestus (admin privileges) |
|
Select SuperDOM only. Deselect the NVMe drives in the installer. The NVMe will be configured post-install for VM storage. |
1.4 Disk Partitioning (Supermicro SSD 118GB)
Actual hardware discovered: ATA SuperMicro SSD (Serial: 515d94618d000003)
| Mount Point | Device | Size | Filesystem |
|---|---|---|---|
/boot/efi |
sda1 |
600 MB |
EFI System Partition |
/boot |
sda2 |
1 GB |
xfs |
/ |
sda3 |
100 GB |
xfs (LVM optional) |
swap |
sda4 |
8 GB |
swap |
|
118GB is ideal for hypervisor OS - plenty of headroom for logs, kernel updates, and dnf cache without wasting NVMe IOPS on idle OS storage. |
1.5 NVMe Configuration (Post-Install)
After Rocky is installed, configure NVMe for VM storage.
1.5.1 Verify NVMe Device
# List all block devices
lsblk
# Verify NVMe is detected (Samsung 990 EVO Plus 2TB)
lsblk | awk '/nvme/'
# Check NVMe details
sudo nvme list
1.5.2 Partition NVMe with LVM
|
This will destroy all data on the NVMe. Verify device name before proceeding. |
# Verify target device (should be ~2TB unpartitioned)
sudo fdisk -l /dev/nvme0n1
# Create GPT label (if not done)
sudo parted /dev/nvme0n1 mklabel gpt
# Create single partition for LVM (entire disk)
sudo parted /dev/nvme0n1 mkpart primary 0% 100%
sudo parted /dev/nvme0n1 set 1 lvm on
# Verify partition created with LVM flag
sudo parted /dev/nvme0n1 print
1.5.3 Configure LVM
LVM provides:
-
Thin provisioning - Overcommit storage, allocate on write
-
Snapshots - Point-in-time VM backups
-
Flexible resize - Grow/shrink volumes without downtime
# Create physical volume
sudo pvcreate /dev/nvme0n1p1
# Verify PV
sudo pvs
# Create volume group for VMs
sudo vgcreate vg_vms /dev/nvme0n1p1
# Verify VG
sudo vgs
# Create thin pool (use 95% of VG for thin pool)
sudo lvcreate -l 95%VG --thinpool tp_vms vg_vms
# Verify thin pool
sudo lvs
# Create logical volume for VM images (thin provisioned)
# This will grow as VMs are created
sudo lvcreate -V 1.5T --thin -n lv_images vg_vms/tp_vms
# Verify LV
sudo lvs -a
1.5.4 Format and Mount
# Format with XFS (optimal for VM images)
sudo mkfs.xfs /dev/vg_vms/lv_images
# Create mount point (if not exists)
sudo mkdir -p /var/lib/libvirt/images
# Mount
sudo mount /dev/vg_vms/lv_images /var/lib/libvirt/images
# Verify mount
df -h /var/lib/libvirt/images
1.5.5 Configure Persistent Mount
# Add to fstab using LVM path (stable across reboots)
echo '/dev/vg_vms/lv_images /var/lib/libvirt/images xfs defaults 0 0' | sudo tee -a /etc/fstab
# Verify fstab entry
grep libvirt /etc/fstab
# Test fstab (unmount and remount via fstab)
sudo umount /var/lib/libvirt/images
sudo mount -a
df -h /var/lib/libvirt/images
1.5.6 Reserve Space for Thin Pool Metadata
# Check thin pool usage (data% and metadata%)
sudo lvs -o+data_percent,metadata_percent vg_vms/tp_vms
|
Thin pool monitoring: Set up alerts when data% exceeds 80%. Thin pools that run out of space will suspend all VMs using them. |
1.5.7 Set SELinux Context
# Install semanage tool (if not present)
sudo dnf install -y policycoreutils-python-utils
# Set correct SELinux context for libvirt
sudo semanage fcontext -a -t virt_image_t "/var/lib/libvirt/images(/.*)?"
sudo restorecon -Rv /var/lib/libvirt/images
# Verify context
ls -laZ /var/lib/libvirt/images
|
Architecture: SuperDOM = OS (small, fast boot) | NVMe = VM storage (high IOPS). This separates OS from VM workloads. |
1.6 Network Configuration (Installer)
| Setting | Value |
|---|---|
Hostname |
kvm-02.inside.domusdigitalis.dev |
IPv4 Method |
Manual |
IPv4 Address |
10.50.1.111 |
Netmask |
255.255.255.0 |
Gateway |
10.50.1.1 |
DNS |
10.50.1.90,10.50.1.91 |
1.7 DNS Record Configuration
After installation completes, add DNS records for kvm-02 and its IPMI interface.
1.7.1 Pre-flight: Inspect Zone Files
From workstation, view current forward zone:
awk 'NR>=74 && NR<=90 {print NR": "$0}' <(ssh bind-01 "sudo cat /var/named/inside.domusdigitalis.dev.zone")
Check for existing kvm/netscaler entries:
grep -n "kvm\|netscaler\|Hypervisor\|Load Balancer" <(ssh bind-01 "sudo cat /var/named/inside.domusdigitalis.dev.zone")
View current reverse zone:
grep -n "110\|111\|200\|201" <(ssh bind-01 "sudo cat /var/named/10.50.1.rev")
1.7.2 BIND Forward Zone Modifications
Step 1: Backup
ssh bind-01 "sudo cp /var/named/inside.domusdigitalis.dev.zone /var/named/inside.domusdigitalis.dev.zone.bak-$(date +%Y%m%d)"
Verify backup (no output = identical = success):
diff <(ssh bind-01 "sudo cat /var/named/inside.domusdigitalis.dev.zone") <(ssh bind-01 "sudo cat /var/named/inside.domusdigitalis.dev.zone.bak-$(date +%Y%m%d)")
Step 2: Replace Load Balancers with Hypervisors
ssh bind-01 "sudo sed -i '
/; Load Balancers/c\; Hypervisors (.110-119)
/netscaler-01/d
/netscaler-02/c\kvm-02 IN A 10.50.1.111
' /var/named/inside.domusdigitalis.dev.zone"
Validate:
grep -n "Hypervisor\|kvm-02" <(ssh bind-01 "sudo cat /var/named/inside.domusdigitalis.dev.zone")
Step 3: Add ipmi-02
ssh bind-01 "sudo sed -i '/ipmi-01.*10.50.1.200/a\ipmi-02 IN A 10.50.1.201' /var/named/inside.domusdigitalis.dev.zone"
Validate:
grep -n "ipmi" <(ssh bind-01 "sudo cat /var/named/inside.domusdigitalis.dev.zone")
Step 4: Remove stale lb CNAME
ssh bind-01 "sudo sed -i '/lb.*CNAME.*netscaler/d' /var/named/inside.domusdigitalis.dev.zone"
Step 5: Increment serial
ssh bind-01 "sudo sed -i 's/2026022401/2026030101/' /var/named/inside.domusdigitalis.dev.zone"
Validate:
awk 'NR>=2 && NR<=4 {print NR": "$0}' <(ssh bind-01 "sudo cat /var/named/inside.domusdigitalis.dev.zone")
Step 6: Final forward zone verification
Diff backup vs current:
diff <(ssh bind-01 "sudo cat /var/named/inside.domusdigitalis.dev.zone.bak-$(date +%Y%m%d)") <(ssh bind-01 "sudo cat /var/named/inside.domusdigitalis.dev.zone")
1.7.3 BIND Reverse Zone Modifications
Step 7: Backup reverse zone
ssh bind-01 "sudo cp /var/named/10.50.1.rev /var/named/10.50.1.rev.bak-$(date +%Y%m%d)"
Validate backup:
diff <(ssh bind-01 "sudo cat /var/named/10.50.1.rev") <(ssh bind-01 "sudo cat /var/named/10.50.1.rev.bak-$(date +%Y%m%d)")
Step 8: Add ipmi-02 PTR
ssh bind-01 "sudo sed -i '/^200.*ipmi-01/a\201 IN PTR ipmi-02.inside.domusdigitalis.dev.' /var/named/10.50.1.rev"
Validate:
grep -n "200\|201\|ipmi" <(ssh bind-01 "sudo cat /var/named/10.50.1.rev")
Step 9: Replace netscaler PTRs with kvm-02
ssh bind-01 "sudo sed -i '/^110.*netscaler-01/d' /var/named/10.50.1.rev"
ssh bind-01 "sudo sed -i '/^111.*netscaler-02/c\111 IN PTR kvm-02.inside.domusdigitalis.dev.' /var/named/10.50.1.rev"
Validate:
grep -n "110\|111\|kvm\|netscaler" <(ssh bind-01 "sudo cat /var/named/10.50.1.rev")
Step 10: Reload BIND
ssh bind-01 "sudo rndc reload"
Step 11: Test forward resolution
dig +short kvm-02.inside.domusdigitalis.dev @10.50.1.90
dig +short ipmi-02.inside.domusdigitalis.dev @10.50.1.90
Step 12: Test reverse resolution
dig +short -x 10.50.1.111 @10.50.1.90
dig +short -x 10.50.1.201 @10.50.1.90
1.7.4 DNS Validation
| DNS records are added via BIND nsupdate (see sections 1.7.2 and 1.7.3 above). VyOS forwards DNS queries to BIND. |
Validate all DNS queries return correct results:
for host in kvm-02 ipmi-02; do
fqdn="$host.inside.domusdigitalis.dev"
bind=$(dig +short "$fqdn" @10.50.1.90)
local=$(dig +short "$fqdn")
printf "%-10s BIND=%-15s LOCAL=%-15s %s\n" \
"$host" "$bind" "$local" \
"$([ "$bind" = "$local" ] && echo '✓ MATCH' || echo '⚠ MISMATCH')"
done
Reverse lookup validation:
for ip in 10.50.1.111 10.50.1.201; do
echo "$ip -> $(dig +short -x $ip @10.50.1.90)"
done
Phase 2: Post-Installation Configuration
2.1 Enable Repositories and Update
# Check current repos
dnf repolist
# Enable EPEL (Extra Packages for Enterprise Linux)
sudo dnf install -y epel-release
# Enable CRB (CodeReady Builder - replacement for PowerTools)
sudo dnf config-manager --set-enabled crb
# Verify repos enabled
dnf repolist | grep -E "epel|crb"
# Update system
sudo dnf update -y
# Verify kernel version
uname -r
|
If kernel was updated, reboot before continuing:
|
2.2 Install Essential Packages
# System utilities
sudo dnf install -y \
vim-enhanced git tmux htop tree wget curl \
bash-completion NetworkManager-tui \
openssh-server rsync tar unzip \
firewalld chrony ipmitool \
policycoreutils-python-utils \
nmap-ncat
# Verify critical packages installed
rpm -q vim-enhanced tmux htop ipmitool chrony policycoreutils-python-utils
2.3 Configure SELinux
Rocky Linux has SELinux enforcing by default. Never disable SELinux on production systems.
# Check SELinux status
getenforce
# Check SELinux config file
grep "^SELINUX=" /etc/selinux/config
# If permissive or disabled, set to enforcing
sudo sed -i 's/^SELINUX=.*/SELINUX=enforcing/' /etc/selinux/config
# Verify change
grep "^SELINUX=" /etc/selinux/config
|
If SELinux was disabled, a full system relabel is required on reboot. This can take 10+ minutes. |
# Check for SELinux denials (should be empty on fresh install)
sudo ausearch -m avc -ts recent 2>/dev/null | head -20
|
SELinux may block NFS-backed storage pools initially. If VMs fail to start with permission errors:
|
2.4 Configure SSH and Public Keys
2.4.1 Enable SSH on kvm-02
sudo systemctl enable --now sshd
Verify:
systemctl is-active sshd && echo "sshd running"
2.4.2 Copy SSH keys from workstation
From workstation, copy public key:
ssh-copy-id evanusmodestus@kvm-02
If ssh-copy-id fails with "Too many authentication failures":
ssh-copy-id -o PreferredAuthentications=password -o PubkeyAuthentication=no evanusmodestus@kvm-02
2.4.3 Troubleshooting: Stale mux socket
If connections fail due to cached state:
rm ~/.ssh/sockets/evanusmodestus@10.50.1.111-22 2>/dev/null
2.4.4 Fallback: Manual key copy via console
If ssh-copy-id fails completely (PAM issues, keyboard-interactive problems), use console access.
On workstation, copy all public keys to clipboard:
cat ~/.ssh/id_ed25519_vault.pub \
~/.ssh/id_ed25519_sk_rk_d000_nano.pub \
~/.ssh/id_ed25519_sk_rk_d000.pub \
~/.ssh/id_ed25519_sk_rk_d000_secondary.pub \
~/.ssh/id_ed25519_d000.pub | wl-copy
On kvm-02 console (serial/IPMI), create authorized_keys:
mkdir -p ~/.ssh && chmod 700 ~/.ssh
Paste all keys (Ctrl+Shift+V in terminal):
cat >> ~/.ssh/authorized_keys << 'EOF'
<paste clipboard contents here - 5 keys>
EOF
chmod 600 ~/.ssh/authorized_keys
Verify file:
wc -l ~/.ssh/authorized_keys
5 /home/evanusmodestus/.ssh/authorized_keys
2.4.6 Alternative: Manual key installation
On kvm-02:
# Create .ssh directory
mkdir -p ~/.ssh
chmod 700 ~/.ssh
# Add your public key
cat >> ~/.ssh/authorized_keys << 'EOF'
# Vault SSH CA cert (primary)
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEXz... evanusmodestus@vault-ca
# YubiKey FIDO2 backup
sk-ssh-ed25519@openssh.com AAAAGnNrLXNzaC1... evanusmodestus@yubikey
EOF
chmod 600 ~/.ssh/authorized_keys
SSH hardening in /etc/ssh/sshd_config:
sudo sed -i 's/^#PermitRootLogin.*/PermitRootLogin no/' /etc/ssh/sshd_config
sudo sed -i 's/^#PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo sed -i 's/^#PubkeyAuthentication.*/PubkeyAuthentication yes/' /etc/ssh/sshd_config
sudo systemctl reload sshd
2.5 Configure Serial Console for Emergency Access
Serial console allows access via IPMI SOL even when network fails:
# Enable serial console on GRUB
sudo grubby --update-kernel=ALL --args="console=tty0 console=ttyS0,115200n8"
# Enable getty on serial console
sudo systemctl enable serial-getty@ttyS0.service
sudo systemctl start serial-getty@ttyS0.service
# Verify
cat /proc/cmdline | grep console
|
Access serial console via IPMI:
Exit with |
2.6 Configure Vault SSH CA Trust
Reference: Vault SSH CA Runbook
2.6.1 From workstation: Locate Vault CA public key
Check if CA exists locally:
ls -la ~/.ssh/vault-ca.pub
If not present, fetch from Vault:
ssh vault-01 "export VAULT_ADDR='http://127.0.0.1:8200' && vault read -field=public_key ssh/config/ca" > ~/.ssh/vault-ca.pub
Verify:
awk '{print "Type:", $1, "Length:", length($2)}' ~/.ssh/vault-ca.pub
Type: ssh-rsa Length: 716
2.6.3 On kvm-02: Install Vault CA
Move to trusted location:
sudo mv /tmp/vault-ca.pub /etc/ssh/vault-ca.pub
Set permissions:
sudo chmod 644 /etc/ssh/vault-ca.pub
Verify installation:
ls -la /etc/ssh/vault-ca.pub && awk '{print "Type:", $1}' /etc/ssh/vault-ca.pub
-rw-r--r--. 1 root root 742 Mar 1 18:00 /etc/ssh/vault-ca.pub Type: ssh-rsa
2.6.4 Configure sshd to trust Vault CA
Add TrustedUserCAKeys directive:
echo "TrustedUserCAKeys /etc/ssh/vault-ca.pub" | sudo tee -a /etc/ssh/sshd_config
Validate sshd config:
sudo sshd -t
| No output = valid. Any errors must be fixed before restart. |
Restart sshd:
sudo systemctl restart sshd
Verify sshd running:
systemctl is-active sshd && echo "sshd running"
2.6.5 Test from workstation (new terminal)
| Keep current SSH session open. Test in NEW terminal. |
Fresh connection (bypass mux):
ssh -o ControlPath=none kvm-02 'hostname && whoami'
Verify certificate was used:
ssh -o ControlPath=none -v kvm-02 'hostname' 2>&1 | grep -E "CERT|Server accepts"
debug1: Server accepts key: .ssh/id_ed25519_vault-cert.pub ED25519-CERT
2.6.6 Verify on kvm-02
Check sshd logs for certificate auth:
sudo journalctl -u sshd -n 10 | awk '/Accepted.*CERT/'
Accepted publickey for evanusmodestus from 10.x.x.x port xxxxx ssh2: ED25519-CERT ID vault-userkey-evanusmodestus
ED25519-CERT confirms certificate auth. ED25519 alone indicates key-based (non-cert) auth.
|
2.7 Configure Firewall
# Enable firewall
sudo systemctl enable --now firewalld
# Allow SSH and libvirt
sudo firewall-cmd --permanent --add-service=ssh
sudo firewall-cmd --permanent --add-service=cockpit
sudo firewall-cmd --permanent --add-service=libvirt
sudo firewall-cmd --permanent --add-service=nfs
# Allow VNC for VM consoles
sudo firewall-cmd --permanent --add-port=5900-5910/tcp
# Allow NFS (if not covered by service)
sudo firewall-cmd --permanent --add-port=2049/tcp
sudo firewall-cmd --permanent --add-port=111/tcp
sudo firewall-cmd --permanent --add-port=111/udp
# Allow libvirt migration (if doing live migration later)
sudo firewall-cmd --permanent --add-port=49152-49215/tcp
# Allow Prometheus node exporter (for monitoring)
sudo firewall-cmd --permanent --add-port=9100/tcp
# Reload
sudo firewall-cmd --reload
# Verify
sudo firewall-cmd --list-all
2.8 Configure Time Sync
Time sync is critical for Kerberos, TLS certificates, and log correlation.
# Check current NTP sources
chronyc sources
# Backup current config
sudo cp /etc/chrony.conf /etc/chrony.conf.bak
# Configure chrony for internal NTP (VyOS gateway)
cat << 'EOF' | sudo tee /etc/chrony.conf
# Internal NTP source (VyOS)
server 10.50.1.1 iburst
# Drift file
driftfile /var/lib/chrony/drift
# Allow large time jumps on first 3 updates
makestep 1.0 3
# Sync RTC
rtcsync
EOF
# Restart chronyd to apply
sudo systemctl restart chronyd
# Verify VyOS is the NTP source
chronyc sources
# Check sync status (should show "Leap status: Normal")
chronyc tracking
2.9 Install Wazuh Agent
# Import Wazuh GPG key
sudo rpm --import https://packages.wazuh.com/key/GPG-KEY-WAZUH
# Add Wazuh repository
cat << 'EOF' | sudo tee /etc/yum.repos.d/wazuh.repo
[wazuh]
gpgcheck=1
gpgkey=https://packages.wazuh.com/key/GPG-KEY-WAZUH
enabled=1
name=EL-$releasever - Wazuh
baseurl=https://packages.wazuh.com/4.x/yum/
protect=1
EOF
# Install Wazuh agent
sudo dnf install -y wazuh-agent
# Configure agent
sudo sed -i 's/MANAGER_IP/10.50.1.134/' /var/ossec/etc/ossec.conf
# Start and enable
sudo systemctl daemon-reload
sudo systemctl enable --now wazuh-agent
# Verify agent processes running
sudo /var/ossec/bin/wazuh-control status
# Check agent logs for manager connection
sudo tail -20 /var/ossec/logs/ossec.log | grep -i manager
|
Verify agent shows up in Wazuh Dashboard: wazuh.inside.domusdigitalis.dev:8443 |
Phase 3: KVM/QEMU Hypervisor Setup
3.1 Install Virtualization Stack
# Install KVM, QEMU, libvirt
sudo dnf install -y \
qemu-kvm libvirt libvirt-daemon-kvm \
virt-install virt-manager virt-viewer \
bridge-utils virt-top libguestfs-tools
# Install Cockpit for web management
sudo dnf install -y cockpit cockpit-machines cockpit-storaged
3.2 Enable Virtualization Services
# Start and enable libvirt
sudo systemctl enable --now libvirtd
# Enable Cockpit
sudo systemctl enable --now cockpit.socket
# Add user to libvirt group
sudo usermod -aG libvirt evanusmodestus
3.3 Verify KVM Support
# Check virtualization extensions
egrep -c '(vmx|svm)' /proc/cpuinfo
# Should return > 0
# Check KVM modules
lsmod | grep kvm
# Verify libvirt
sudo virsh list --all
3.4 Enable Nested Virtualization
# Enable nested virtualization (Intel)
cat << 'EOF' | sudo tee /etc/modprobe.d/kvm.conf
options kvm_intel nested=1
EOF
# Reload KVM module (requires reboot for full effect)
sudo modprobe -r kvm_intel
sudo modprobe kvm_intel
# Verify
cat /sys/module/kvm_intel/parameters/nested
# Should show Y or 1
Phase 4: Storage Configuration
4.1 Configure Local Storage Pool
# Create local storage pool for performance-critical VMs
sudo virsh pool-define-as local-vms dir - - - - /var/lib/libvirt/images
sudo virsh pool-build local-vms
sudo virsh pool-start local-vms
sudo virsh pool-autostart local-vms
4.2 Configure NAS Storage
# Install NFS utilities
sudo dnf install -y nfs-utils
# Create mount points (bash brace expansion)
sudo mkdir -p /mnt/nas/vms /mnt/nas/isos /mnt/nas/backups /mnt/nas/k3s
# Add fstab entries
cat << 'EOF' | sudo tee -a /etc/fstab
# Synology NAS mounts
10.50.1.70:/volume1/VMs /mnt/nas/vms nfs defaults,_netdev,nofail 0 0
10.50.1.70:/volume1/ISOs /mnt/nas/isos nfs defaults,_netdev,nofail 0 0
10.50.1.70:/volume1/Backups /mnt/nas/backups nfs defaults,_netdev,nofail 0 0
10.50.1.70:/volume1/k3s /mnt/nas/k3s nfs defaults,_netdev,nofail 0 0
EOF
# Reload systemd and mount
sudo systemctl daemon-reload
sudo mount -a
# Verify mounts
df -h | grep nas
4.3 Create libvirt Storage Pools for NAS
# ISOs pool
sudo virsh pool-define-as nas-isos dir - - - - /mnt/nas/isos
sudo virsh pool-build nas-isos
sudo virsh pool-start nas-isos
sudo virsh pool-autostart nas-isos
# VMs pool (for non-critical VMs)
sudo virsh pool-define-as nas-vms dir - - - - /mnt/nas/vms
sudo virsh pool-build nas-vms
sudo virsh pool-start nas-vms
sudo virsh pool-autostart nas-vms
Phase 5: Network Configuration
5.1 Identify Network Interfaces
# List all interfaces
ip -o link show | awk -F': ' '{print $2}'
# Expected on Supermicro E300-9D-8CN8TP:
# eno1-eno4: 1GbE (Intel I210-AT)
# eno5np0-eno8np3: 10GbE SFP+ (Intel X550)
5.2 Create Bridge for VMs
5.2.1 Identify Active Interface
# Find interface with management IP
ip -4 addr show | grep -B2 "10.50.1.111"
# Note: kvm-02 uses eno8 (10GbE SFP+)
# List current connections
nmcli connection show --active
5.2.2 Create Bridge
# Create management bridge
sudo nmcli connection add type bridge con-name br-mgmt ifname br-mgmt
# Configure bridge IP (same as current interface)
sudo nmcli connection modify br-mgmt ipv4.addresses 10.50.1.111/24
sudo nmcli connection modify br-mgmt ipv4.gateway 10.50.1.1
sudo nmcli connection modify br-mgmt ipv4.dns "10.50.1.90,10.50.1.91"
sudo nmcli connection modify br-mgmt ipv4.method manual
sudo nmcli connection modify br-mgmt connection.autoconnect yes
# Verify bridge config before adding interface
nmcli connection show br-mgmt | grep -E "ipv4\.(address|gateway|dns|method)"
5.2.3 Add Physical Interface to Bridge
# Add eno8 (10GbE) to bridge
# NOTE: Adjust interface name if different on your system
sudo nmcli connection add type bridge-slave con-name br-mgmt-port ifname eno8 master br-mgmt
|
CRITICAL: The next step will briefly drop your SSH connection. Have IPMI SOL ready:
Exit SOL with |
5.2.4 Activate Bridge
# Bring up bridge (connection will drop and reconnect)
sudo nmcli connection up br-mgmt
# Verify bridge state, carrier, and IP (compact output)
ip -br addr show br-mgmt; ip -br link show eno8
# Expected: br-mgmt UP 10.50.1.111/24
# Expected: eno8 UP ... master br-mgmt
# Verify eno8 enslaved to bridge (focused)
bridge link show | grep -E "eno8.*master"
# Expected: eno8 ... master br-mgmt state forwarding
# Verify connection states (tabular)
nmcli -t -f NAME,DEVICE,STATE connection show --active | column -t -s:
# Expected: br-mgmt-port on eno8, br-mgmt on br-mgmt
# Verify connectivity before disabling old connection
ping -c 2 10.50.1.1 && echo "PASS: Gateway reachable"
# ONLY after ALL validation passes: disable old connection
sudo nmcli connection down "10g-mgmt" 2>/dev/null || true
# Verify old connection disabled
nmcli connection show "10g-mgmt" | grep -E "^connection\.(autoconnect|interface)"
# autoconnect should be yes (will try on reboot) - consider:
# sudo nmcli connection modify "10g-mgmt" connection.autoconnect no
Phase 6: Performance Optimization
6.1 Configure Huge Pages
# Allocate 16GB huge pages (8192 x 2MB)
echo "vm.nr_hugepages = 8192" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
# Verify
grep HugePages /proc/meminfo
6.2 CPU Governor
# Install cpupower
sudo dnf install -y kernel-tools
# Set performance governor
sudo cpupower frequency-set -g performance
# Make persistent (CPUPOWER_START_OPTS is the correct variable)
echo 'CPUPOWER_START_OPTS="frequency-set -g performance"' | sudo tee /etc/sysconfig/cpupower
sudo systemctl enable --now cpupower
# Verify service started
systemctl status cpupower.service --no-pager | grep -E "Active:|loaded"
6.3 Configure Swap Behavior
# Check current swappiness (default 60)
cat /proc/sys/vm/swappiness
# Set swappiness to 1 (only swap when critical)
# Keeps swap as safety valve without impacting VM performance
# Huge pages (16GB) are non-swappable regardless
echo "vm.swappiness = 1" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p | grep swappiness
# Expected: vm.swappiness = 1
Phase 8: Verification
8.1 Comprehensive Health Check Script
Create the health check script:
cat << 'HEALTHEOF' | sudo tee /usr/local/bin/kvm-health-check > /dev/null
#!/bin/bash
# kvm-02 Health Check Script
# Run after deployment or anytime to verify system state
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
NC='\033[0m'
echo "================================================"
echo " kvm-02 Health Check"
echo "================================================"
echo ""
PASSED=0
FAILED=0
check() {
local name="$1"
local cmd="$2"
if eval "$cmd" &>/dev/null; then
printf " ${GREEN}✓${NC} %-35s ${GREEN}[OK]${NC}\n" "$name"
((PASSED++))
else
printf " ${RED}✗${NC} %-35s ${RED}[FAIL]${NC}\n" "$name"
((FAILED++))
fi
}
echo "== Services =="
check "libvirtd running" "systemctl is-active libvirtd"
check "cockpit.socket running" "systemctl is-active cockpit.socket"
check "chronyd running" "systemctl is-active chronyd"
check "firewalld running" "systemctl is-active firewalld"
check "sshd running" "systemctl is-active sshd"
check "wazuh-agent running" "systemctl is-active wazuh-agent"
echo ""
echo "== Virtualization =="
check "KVM modules loaded" "lsmod | grep -q kvm_intel"
check "Nested virtualization" "cat /sys/module/kvm_intel/parameters/nested | grep -q Y"
check "VT-d (IOMMU) enabled" "dmesg | grep -q 'IOMMU enabled'"
check "libvirt connection" "sudo virsh list --all"
echo ""
echo "== Storage =="
check "local-vms pool active" "sudo virsh pool-info local-vms | grep -q 'State:.*running'"
check "Huge pages allocated" "grep -q 'HugePages_Total:.*[1-9]' /proc/meminfo"
# NAS pools (optional - uncomment when NAS NFS permissions configured)
# check "nas-isos pool active" "sudo virsh pool-info nas-isos | grep -q 'State:.*running'"
# check "nas-vms pool active" "sudo virsh pool-info nas-vms | grep -q 'State:.*running'"
# check "NFS mounts present" "df -h | grep -q nas"
echo ""
echo "== Performance =="
check "CPU governor performance" "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor | grep -q performance"
check "Swappiness <= 10" "test \$(cat /proc/sys/vm/swappiness) -le 10"
check "cpupower service active" "systemctl is-active cpupower"
echo ""
echo "== Network =="
check "br-mgmt bridge exists" "ip link show br-mgmt"
check "br-mgmt network active" "sudo virsh net-info br-mgmt | grep -q 'Active:.*yes'"
check "Gateway reachable" "ping -c1 -W2 10.50.1.1"
check "DNS resolution" "host kvm-01.inside.domusdigitalis.dev"
check "Vault reachable" "curl -sk https://vault-01.inside.domusdigitalis.dev:8200/v1/sys/health | grep -q initialized"
echo ""
echo "== Security =="
check "SELinux enforcing" "getenforce | grep -q Enforcing"
check "Vault SSH CA trusted" "test -f /etc/ssh/vault-ca.pub"
check "Password auth disabled" "sshd -T | grep -q 'passwordauthentication no'"
echo ""
echo "================================================"
printf "Results: ${GREEN}%d passed${NC}, ${RED}%d failed${NC}\n" "$PASSED" "$FAILED"
echo "================================================"
[[ $FAILED -gt 0 ]] && exit 1
exit 0
HEALTHEOF
sudo chmod +x /usr/local/bin/kvm-health-check
Verify script content (first 5 and last 5 lines):
# Show first 5 and last 5 lines with line numbers
awk 'NR<=5 {print NR": "$0; next} {buf[NR]=$0} END {print "..."; for(i=NR-4;i<=NR;i++) print i": "buf[i]}' /usr/local/bin/kvm-health-check
Troubleshooting: Unsubstituted Attributes
If the script was copied from raw AsciiDoc, attributes like inside.domusdigitalis.dev won’t be substituted.
# Check for unsubstituted attributes
grep -n '{' /usr/local/bin/kvm-health-check
# Fix common unsubstituted attributes
sudo sed -i 's/\{domain}/inside.domusdigitalis.dev/g' /usr/local/bin/kvm-health-check
sudo sed -i 's|\{vault-addr}|https://vault-01.inside.domusdigitalis.dev:8200|g' /usr/local/bin/kvm-health-check
sudo sed -i 's/\{vyos-vip}/10.50.1.1/g' /usr/local/bin/kvm-health-check
# Verify fixes applied
grep -n 'inside.domusdigitalis.dev\|10.50.1.1\|vault-01' /usr/local/bin/kvm-health-check
Troubleshooting: Health Check Failures
DNS tools missing (host/dig commands):
sudo dnf install -y bind-utils
IOMMU check pattern (DMAR-IR present but check fails):
# Verify IOMMU is actually present
dmesg | grep -i "DMAR-IR\|IOMMU" | head -3
# Fix health check pattern to detect DMAR-IR
sudo sed -i 's/dmesg | grep -q .IOMMU enabled./dmesg | grep -qi "DMAR-IR.*IOMMU\\|IOMMU.*enabled"/' /usr/local/bin/kvm-health-check
Password auth check (needs sudo to read sshd config):
# Verify current setting
sudo sshd -T 2>/dev/null | grep passwordauthentication
# Fix health check to use sudo
sudo sed -i 's/sshd -T/sudo sshd -T/' /usr/local/bin/kvm-health-check
Run the health check:
/usr/local/bin/kvm-health-check
AWK Output Filtering Patterns
# First N lines (like head -20)
/usr/local/bin/kvm-health-check 2>&1 | awk 'NR<=20'
# Line range (lines 10-25)
/usr/local/bin/kvm-health-check 2>&1 | awk 'NR>=10 && NR<=25'
# Extract section between pattern and empty line
/usr/local/bin/kvm-health-check 2>&1 | awk '/Virtualization/,/^$/'
/usr/local/bin/kvm-health-check 2>&1 | awk '/Storage/,/^$/'
/usr/local/bin/kvm-health-check 2>&1 | awk '/Security/,/^$/'
# Show only failures
/usr/local/bin/kvm-health-check 2>&1 | awk '/FAIL/'
# Show summary line only
/usr/local/bin/kvm-health-check 2>&1 | awk '/Results:/'
8.2 Quick Manual Verification
# Services
systemctl is-active libvirtd cockpit.socket chronyd firewalld sshd wazuh-agent
# Storage pools
sudo virsh pool-list --all
# Networks
sudo virsh net-list --all
# Bridge
ip -4 addr show br-mgmt | awk '/inet/{print $2}'
# NFS mounts
df -h | awk '/nas/{print $1, $6}'
# Huge pages
awk '/HugePages_Total/{print "Huge Pages:", $2}' /proc/meminfo
Next Steps
After kvm-02 is deployed:
-
[ ] Deploy vault-02 VM (Vault HA cluster)
-
[ ] Deploy vault-03 VM (Vault HA cluster)
-
[ ] Deploy k3s-master-02 VM (k3s HA)
-
[ ] Deploy k3s-master-03 VM (k3s HA)
-
[ ] Configure VM backups to NAS
-
[ ] Add to monitoring (Prometheus/Wazuh)
Troubleshooting
Lessons Learned from kvm-01
|
These lessons were learned the hard way during kvm-01 deployment. Avoid repeating them on kvm-02. |
Storage
| Issue | Solution |
|---|---|
Root partition too small |
kvm-01 had only 14GB root, causing I/O errors when VM images filled it. kvm-02 has 100GB root + 1.5TB for |
VMs on root partition |
Always create VMs on dedicated storage ( |
NFS mount failures after reboot |
Use |
Networking
| Issue | Solution |
|---|---|
cloud-init static IP ignored |
Rocky 9 cloud images get DHCP before cloud-init runs. Must fix manually after first boot with |
Bridge vs virbr0 confusion |
Existing VMs used |
SSH breaks after network changes |
Always have IPMI console ready when modifying bridge configuration. Test with |
Vault SSH CA
| Issue | Solution |
|---|---|
"Not yet valid" certificate error |
Clock skew. NTP was disabled on kvm-01. Fixed with |
Principals mismatch |
Cert only had |
TrustedUserCAKeys placement |
Must be BEFORE any |
Virtualization
| Issue | Solution |
|---|---|
VT-d not enabled in BIOS |
Without VT-d, PCI passthrough fails silently. Verify with |
Nested virtualization disabled |
Required for k3s VMs. Enable with |
CPU pinning complexity |
Document CPU pinning for performance-critical VMs. See KVM Operations. |
Appendix A: Attribute Reference
This runbook uses Antora attributes for infrastructure values. Attributes are defined in:
docs/asciidoc/antora.yml
Chronicle
| Date | Event |
|---|---|
2026-02-XX |
kvm-02 deployment started |
2026-02-XX |
Hardware installed (990 EVO Plus 2TB) |
2026-02-XX |
Rocky Linux 9.x installed |
2026-02-XX |
KVM/QEMU configured |
2026-02-XX |
First VM deployed |
References
-
VyOS Migration - kvm-02 is a prerequisite for this migration