k3s Kubernetes Deployment
Single-node k3s Kubernetes deployment on Rocky Linux 9 with SELinux enforcing and Vault Agent for secrets injection.
Overview
| Component | Value |
|---|---|
Node |
k3s-master-01 (10.50.1.120) |
OS |
Rocky Linux 9 (RHEL 9 family) |
SELinux |
Enforcing (required) |
Host Firewall |
firewalld (RHEL standard, nftables backend) |
CNI / Network Policy |
Cilium (eBPF-based, L3-L7 policies, replaces Flannel) |
Container Runtime |
containerd (k3s embedded) |
Secrets |
HashiCorp Vault Agent (required) |
Ingress |
Traefik (k3s default) |
|
Defense-in-Depth Security Posture:
Zero-trust means securing at EVERY layer, not just the perimeter. |
Session Variables
|
Set these variables at the start of your deployment session. Adjust for each node being deployed. |
Shell Variables (Copy to Terminal)
# ============================================================
# K3S DEPLOYMENT SESSION VARIABLES
# Adjust K3S_NODE_* for each node deployment
# ============================================================
# Target Node (change for each deployment)
K3S_NODE_NAME="k3s-master-01"
K3S_NODE_IP="10.50.1.120"
K3S_NODE_ROLE="server" # server=master, agent=worker
# Hypervisor (kvm-01 for master-01/worker-01, kvm-02 for others)
KVM_HOST="kvm-01"
# Cluster Configuration
K3S_TOKEN="<generated-token>" # Generate once, use for all nodes
K3S_SERVER_URL="https://10.50.1.120:6443" # For joining nodes
# Network Configuration
DOMAIN="inside.domusdigitalis.dev"
GATEWAY="10.50.1.1" # VyOS HA VIP
DNS_PRIMARY="10.50.1.90"
DNS_SECONDARY="10.50.1.91"
# Storage
NAS_IP="10.50.1.70"
NFS_SHARE="/volume1/k3s"
# Vault Integration
VAULT_ADDR="https://vault-01.inside.domusdigitalis.dev:8200"
Node Matrix Reference
| Node | IP | Role | Hypervisor | Priority |
|---|---|---|---|---|
k3s-master-01 |
10.50.1.120 |
server (etcd) |
kvm-01 |
First |
k3s-master-02 |
10.50.1.121 |
server (etcd) |
kvm-02 |
Second |
k3s-master-03 |
10.50.1.122 |
server (etcd) |
kvm-02 |
Third |
k3s-worker-01 |
10.50.1.123 |
agent |
kvm-01 |
After masters |
k3s-worker-02 |
10.50.1.124 |
agent |
kvm-02 |
After masters |
k3s-worker-03 |
10.50.1.125 |
agent |
kvm-02 |
After masters |
Verify Variables
echo "=== k3s Node Deployment ==="
echo "Node: $K3S_NODE_NAME"
echo "IP: $K3S_NODE_IP"
echo "Role: $K3S_NODE_ROLE"
echo "Hypervisor: $KVM_HOST"
echo "Gateway: $GATEWAY"
echo "DNS: $DNS_PRIMARY"
|
Gateway: VyOS HA VIP |
Phase 1: VM Creation
Option A: Cloud Image (Recommended)
Faster deployment using Rocky 9 GenericCloud image with cloud-init.
1.1 Copy and resize base image:
cd /var/lib/libvirt/images
sudo cp Rocky-9-GenericCloud-Base.latest.x86_64.qcow2 k3s-master-01.qcow2
sudo qemu-img resize k3s-master-01.qcow2 50G
1.2 Create cloud-init configuration:
The heredoc must have NO leading whitespace. The #cloud-config line must start at column 0.
|
plain_text_passwd is required for console access (VNC/serial). SSH keys alone won’t help if network fails. Change password after first login with passwd.
|
cat > /tmp/k3s-cloud-init.yml << 'EOF'
#cloud-config
hostname: k3s-master-01
fqdn: k3s-master-01.inside.domusdigitalis.dev
manage_etc_hosts: true
users:
- name: evanusmodestus
groups: wheel
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
lock_passwd: false
plain_text_passwd: changeme123
ssh_authorized_keys:
- ssh-ed25519 AAAAC3... # Add your SSH public key
- sk-ssh-ed25519@openssh.com AAAAG... # YubiKey resident key (optional)
write_files:
- path: /etc/NetworkManager/system-connections/eth0.nmconnection
permissions: '0600'
content: |
[connection]
id=eth0
type=ethernet
interface-name=eth0
autoconnect=true
[ipv4]
method=manual
addresses=10.50.1.120/24
gateway=10.50.1.1
dns=10.50.1.90;10.50.1.91
[ipv6]
method=disabled
runcmd:
- nmcli connection reload
- nmcli connection up eth0
- growpart /dev/vda 4
- xfs_growfs /
EOF
1.2.1 Verify cloud-init YAML (no leading whitespace):
awk 'NR<=5 {print NR": ["$0"]"}' /tmp/k3s-cloud-init.yml
[)1: [#cloud-config] 2: [hostname: k3s-master-01] 3: [fqdn: k3s-master-01.inside.domusdigitalis.dev] 4: [manage_etc_hosts: true] 5: []
If there’s leading whitespace, fix with:
sed -i 's/^ //' /tmp/k3s-cloud-init.yml
1.3 Create cloud-init ISO:
Requires cloud-utils package. On Arch: sudo pacman -S cloud-utils
|
sudo cloud-localds /var/lib/libvirt/images/k3s-cloud-init.iso /tmp/k3s-cloud-init.yml
1.4 Create VM with virt-install:
sudo virt-install \
--name k3s-master-01 \
--memory 4096 \
--vcpus 2 \
--disk path=/var/lib/libvirt/images/k3s-master-01.qcow2,format=qcow2 \
--disk path=/var/lib/libvirt/images/k3s-cloud-init.iso,device=cdrom \
--os-variant rocky9 \
--network bridge=virbr0,model=virtio \
--graphics none \
--console pty,target_type=serial \
--import \
--noautoconsole
1.5 Verify VM started:
sudo virsh list | grep k3s
Option B: ISO Installation (Alternative)
Traditional installation from ISO with kickstart.
virt-install \
--name k3s-master-01 \
--memory 4096 \
--vcpus 2 \
--disk path=/var/lib/libvirt/images/k3s-master-01.qcow2,size=50,format=qcow2 \
--os-variant rocky9 \
--network bridge=virbr0,model=virtio \
--graphics none \
--console pty,target_type=serial \
--location /var/lib/libvirt/iso/Rocky-9-latest-x86_64-minimal.iso \
--extra-args "inst.ks=http://10.50.1.110/ks/rocky9-minimal.cfg ip=10.50.1.120::10.50.1.1:255.255.255.0:k3s-master-01.inside.domusdigitalis.dev:enp1s0:none nameserver=10.50.1.90"
The --extra-args string configures static IP during installation. Format: ip=<ip>::<gateway>:<netmask>:<hostname>:<interface>:none.
|
Manual Installation Settings (if no kickstart)
During Rocky 9 installation:
| Setting | Value |
|---|---|
Hostname |
k3s-master-01.inside.domusdigitalis.dev |
IP Address |
10.50.1.120/24 |
Gateway |
10.50.1.1 |
DNS |
10.50.1.90 |
Root Password |
(from gopass: v3/domains/d000/k3s/k3s-master-01) |
User |
sysadmin (with sudo) |
Partitioning |
Automatic with LVM |
| Store root password in gopass BEFORE installation. Never use weak passwords on infrastructure nodes. |
Phase 2: Base Configuration
2.1 Verify Network
System overview (filtered):
hostnamectl | awk '/Static hostname|Operating System|Kernel|Architecture/ {gsub(/^[[:space:]]+/, ""); print}'
Static hostname: k3s-master-01 Operating System: Rocky Linux 9.7 (Blue Onyx) Kernel: Linux 5.14.0-611.5.1.el9_7.x86_64 Architecture: x86-64
Network configuration summary:
ip -4 -o addr show | awk '{split($4,a,"/"); print $2": "$4" (scope:"$NF")"}' | grep -v "^lo:"
eth0: 10.50.1.120/24 (scope:global)
DNS and gateway verification:
awk '/^nameserver/{print "DNS: "$2} /^search/{print "Search: "$2}' /etc/resolv.conf
Connectivity matrix (parallel checks):
for target in 10.50.1.1 10.50.1.60 10.50.1.90; do
ping -c1 -W1 $target &>/dev/null && echo "$target: OK" || echo "$target: FAIL"
done
{vyos-vip}: OK # VyOS HA VIP gateway
{vault-ip}: OK # Vault
{bind-ip}: OK # bind-01 DNS
DNS resolution test:
for host in vault.inside.domusdigitalis.dev k3s-master-01.inside.domusdigitalis.dev; do
dig +short $host | awk -v h="$host" 'NR==1 {print h": "$0}'
done
2.1.1 Fix Static IP (If DHCP Active)
|
Rocky 9 GenericCloud DHCP Issue: Cloud-init network config is unreliable. The VM often gets DHCP instead of static IP. Additionally, VM restarts (shutdown/start, migration) can revert to DHCP via "System eth0" connection. |
Check if DHCP is active:
nmcli -t -f NAME,DEVICE conn show --active | awk -F: '{print $1": "$2}'
If you see "System eth0" or IP is not 10.50.1.120, fix it:
Delete DHCP connections:
sudo nmcli conn delete "System eth0" 2>/dev/null
sudo nmcli conn delete "Wired connection 1" 2>/dev/null
sudo nmcli conn delete "cloud-init eth0" 2>/dev/null
Create persistent static connection:
sudo nmcli conn add con-name eth0 type ethernet ifname eth0 \
ipv4.method manual \
ipv4.addresses 10.50.1.120/24 \
ipv4.gateway 10.50.1.1 \
ipv4.dns "10.50.1.90,10.50.1.91" \
autoconnect yes
Activate (will disconnect if remote):
sudo nmcli conn up eth0
Reconnect to static IP:
ssh evanusmodestus@10.50.1.120
Verify static IP persists:
ip -4 -o addr show eth0 | awk '{print $4}'
10.50.1.120/24
2.2 Update System
sudo dnf update -y
sudo reboot
| Always reboot after kernel updates to ensure the new kernel is active. |
2.3 Verify SELinux
getenforce
Expected: Enforcing
Extract SELinux mode with awk:
sestatus | awk '/Current mode/{print $3}'
|
SELinux must remain in Enforcing mode. Do NOT set to Permissive. k3s fully supports SELinux when installed with |
2.4 Configure Firewall
|
GenericCloud images are minimal - firewalld is NOT installed by default. Host firewall is mandatory for defense-in-depth. Do NOT rely on perimeter firewall alone. |
2.4.1 Install firewalld (GenericCloud images):
sudo dnf install -y firewalld
sudo systemctl enable --now firewalld
Verify service running:
systemctl status firewalld | awk '/Active:/{print $2, $3}'
2.4.2 Open required ports:
| Port | Protocol | Purpose |
|---|---|---|
6443 |
TCP |
k3s API server (kubectl access) |
80 |
TCP |
Traefik HTTP ingress |
443 |
TCP |
Traefik HTTPS ingress |
4240 |
TCP |
Cilium health checks |
4244 |
TCP |
Hubble Relay (Cilium observability) |
4245 |
TCP |
Hubble UI (optional) |
8472 |
UDP |
VXLAN overlay (Cilium/Flannel) |
10250 |
TCP |
Kubelet metrics |
sudo firewall-cmd --permanent --add-port=6443/tcp
sudo firewall-cmd --permanent --add-port=80/tcp
sudo firewall-cmd --permanent --add-port=443/tcp
sudo firewall-cmd --permanent --add-port=4240/tcp
sudo firewall-cmd --permanent --add-port=4244/tcp
sudo firewall-cmd --permanent --add-port=8472/udp
sudo firewall-cmd --permanent --add-port=10250/tcp
sudo firewall-cmd --reload
Verify with awk extraction:
sudo firewall-cmd --list-ports | awk '{gsub(/ /,"\n"); print}' | sort
10250/tcp 4240/tcp 4244/tcp 443/tcp 6443/tcp 80/tcp 8472/udp
2.5 Install Prerequisites
sudo dnf install -y curl wget tar git jq bash-completion
jq is essential for parsing JSON from kubectl and Vault. bash-completion enables kubectl tab completion.
|
2.6 Configure Shell (History + Completions)
Increase bash history (default 500 is too low):
cat >> ~/.bashrc << 'EOF'
# History configuration
HISTSIZE=10000
HISTFILESIZE=20000
HISTCONTROL=ignoreboth:erasedups
shopt -s histappend
EOF
Reload:
source ~/.bashrc
| Completions for kubectl and cilium are added after their installation in Phase 3. |
2.7 Configure Vault SSH CA Trust
Enable Vault SSH certificate authentication for secure access.
Download Vault CA public key:
curl -sSk https://10.50.1.60:8200/v1/ssh/public_key | sudo tee /etc/ssh/trusted-user-ca-keys.pem
Verify download (first 50 chars):
sudo awk '{print substr($0,1,50); exit}' /etc/ssh/trusted-user-ca-keys.pem
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCdUEnm7yqL
Add TrustedUserCAKeys to sshd_config:
echo 'TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem' | sudo tee -a /etc/ssh/sshd_config
Verify placement (must be BEFORE any Match blocks):
sudo awk '/TrustedUserCAKeys|^Match/' /etc/ssh/sshd_config
Restart sshd:
sudo systemctl restart sshd
Test from workstation:
ssh k3s-master-01 hostname
Phase 3: k3s Installation
3.1 Install k3s with SELinux Support (No Default CNI)
|
SELinux: The Cilium: We disable k3s default networking ( |
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_EXEC="server \
--selinux \
--flannel-backend=none \
--disable-network-policy" sh -
| Flag | Purpose |
|---|---|
|
Run as control-plane node (not agent) |
|
Enable SELinux policy module |
|
Disable default Flannel CNI (Cilium will provide networking) |
|
Disable k3s native NetworkPolicy controller (Cilium handles this) |
|
After this step, the node will show |
Networking Concepts (CCNP Context)
If you have traditional networking background (CCNP, etc.), here’s how Kubernetes networking maps:
| Traditional | Kubernetes | Explanation |
|---|---|---|
VLAN segmentation |
Namespaces + NetworkPolicy |
Logical isolation without L2 boundaries |
VXLAN overlay (EVPN) |
CNI overlay (Cilium/Flannel) |
Same encapsulation - pods get virtual L3 network abstracted from underlay |
VRF / Route tables |
Network namespaces |
Each pod has isolated network stack |
ACLs (L3/L4) |
NetworkPolicy |
Permit/deny by label selector, port, protocol |
NBAR / AVC (L7) |
Cilium L7 Policy |
HTTP method, path, headers - identity-aware |
NAT/PAT |
SNAT (egress), DNAT (Services) |
kube-proxy or Cilium handles translation |
Load balancer (F5, Netscaler) |
Service (ClusterIP, LoadBalancer) |
Round-robin by default, session affinity optional |
Packet capture (SPAN) |
Hubble |
Real-time flow visibility with identity context |
Key abstraction: The overlay network means pods communicate on a flat L3 network (typically 10.42.0.0/16) regardless of the physical underlay. No VLAN trunking, no inter-VLAN routing, no VPN tunnels needed between nodes. The CNI handles encapsulation transparently.
Why this matters:
-
Pod on Node A (10.50.1.120) can reach Pod on Node B (10.50.1.121) directly
-
Physical network only needs IP connectivity between nodes
-
Network policies are declared (YAML), not configured on switches
-
Security follows workload identity, not IP addresses
3.2 Install Cilium CNI (Helm Method)
Cilium provides eBPF-based networking with L3-L7 network policies, identity-aware security, and observability via Hubble.
|
Why Helm over cilium-cli:
The |
3.2.1 Install Helm:
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
3.2.2 Install Cilium CLI (for status/debugging):
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
3.2.3 Add Cilium Helm repository:
helm repo add cilium https://helm.cilium.io/
helm repo update
3.2.4 Create Cilium values file:
|
In production, this file lives in git: |
cat > /tmp/cilium-values.yaml << 'EOF'
# Cilium Helm Values - Production
# k3s single-node deployment
cluster:
name: domus-k3s
# k3s API server (internal)
k8sServiceHost: 127.0.0.1
k8sServicePort: 6443
# Replace kube-proxy entirely (eBPF-native)
kubeProxyReplacement: true
# Network mode
routingMode: tunnel
tunnelProtocol: vxlan
# Operator HA (single node = 1, multi-node = 2)
operator:
replicas: 1
# Hubble observability (headless - CLI only)
hubble:
enabled: true
relay:
enabled: true
# UI disabled - headless server, use `hubble observe` CLI
ui:
enabled: false
# Security hardening
securityContext:
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
EOF
3.2.5 Install Cilium via Helm:
helm install cilium cilium/cilium -n kube-system -f /tmp/cilium-values.yaml
|
Use Cilium 1.16.x LTS - NOT 1.19.x Cilium 1.19.x has incompatibilities with k3s kubeProxyReplacement that break SSH connectivity. Install 1.16.5 (LTS):
|
3.2.6 Wait for Cilium to be ready:
kubectl rollout status daemonset/cilium -n kube-system --timeout=300s
cilium status
/¯¯\
/¯¯\__/¯¯\ Cilium: OK
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Envoy DaemonSet: OK
\__/¯¯\__/ Hubble Relay: OK
\__/ ClusterMesh: disabled
Fallback: Verify via kubectl:
kubectl get pods -n kube-system -l k8s-app=cilium -o custom-columns='NAME:.metadata.name,STATUS:.status.phase,READY:.status.containerStatuses[0].ready'
NAME STATUS READY cilium-xxxxx Running true
3.2.7 Verify node transitions to Ready:
kubectl get nodes -o custom-columns='NAME:.metadata.name,STATUS:.status.conditions[-1].type,VERSION:.status.nodeInfo.kubeletVersion'
NAME STATUS VERSION k3s-master-01.inside.domusdigitalis.dev Ready v1.34.x+k3s1
Kernel warnings like ip_set_init will not be maintained are informational. Cilium uses eBPF, not iptables ip_set.
|
3.2.8 Verify Hubble Relay:
kubectl get pods -n kube-system -l app.kubernetes.io/name=hubble-relay -o custom-columns='NAME:.metadata.name,STATUS:.status.phase,READY:.status.containerStatuses[0].ready'
NAME STATUS READY hubble-relay-xxxxx Running true
3.2.9 Test Hubble CLI:
# Real-time flow visibility
hubble observe
# Filter by namespace
hubble observe -n kube-system
# Show only policy denials
hubble observe --verdict DROPPED
# DNS queries
hubble observe --protocol DNS
hubble observe is how you debug network policy issues in production. Better than any UI.
|
3.2.10 Run connectivity test:
cilium connectivity test --single-node
| This creates test pods and validates network connectivity. Takes 5-10 minutes. |
|
Expected results with Hubble Relay disabled:
To fix VLAN filter warnings:
|
Clean up stale test namespaces before retry:
kubectl delete namespace cilium-test-1 cilium-test-ccnp1 cilium-test-ccnp2 --wait=false 2>/dev/null
3.2.7 Configure shell completions:
cilium completion bash | sudo tee /etc/bash_completion.d/cilium > /dev/null
kubectl completion bash | sudo tee /etc/bash_completion.d/kubectl > /dev/null
Reload completions:
source /etc/bash_completion.d/cilium
source /etc/bash_completion.d/kubectl
Or simply: exec bash
Now cilium conn<TAB> and kubectl get po<TAB> will autocomplete.
3.3 Verify Installation
Service status (structured):
systemctl show k3s --property=ActiveState,SubState,MainPID | awk -F= '{print $1": "$2}'
ActiveState: active SubState: running MainPID: 1234
Node status with jq:
kubectl get nodes -o json | jq -r '.items[] | "\(.metadata.name): \(.status.conditions[] | select(.type=="Ready") | .status) (\(.status.nodeInfo.kubeletVersion))"'
k3s-master-01: True (v1.31.4+k3s1)
Node capacity and allocatable (jq):
kubectl get nodes -o json | jq -r '.items[] | "CPU: \(.status.capacity.cpu) | Memory: \(.status.capacity.memory) | Pods: \(.status.capacity.pods)"'
Pod status matrix (awk pivot table):
kubectl get pods -A --no-headers | awk '
{
ns[$1]++
status[$4]++
combo[$1","$4]++
}
END {
print "=== By Namespace ==="
for(n in ns) printf "%-20s %d\n", n, ns[n]
print "\n=== By Status ==="
for(s in status) printf "%-15s %d\n", s, status[s]
}'
Unhealthy pods only (jq filter):
kubectl get pods -A -o json | jq -r '.items[] | select(.status.phase != "Running" and .status.phase != "Succeeded") | "\(.metadata.namespace)/\(.metadata.name): \(.status.phase)"'
| Empty output = all pods healthy. |
Cilium + k3s health dashboard:
echo "=== k3s ===" && systemctl is-active k3s && \
echo -e "\n=== Cilium ===" && cilium status --output json 2>/dev/null | jq -r '"Cilium: \(.cilium.state)\nOperator: \(.operator.state)\nHubble: \(.hubble.state // "disabled")"' && \
echo -e "\n=== Nodes ===" && kubectl get nodes --no-headers | awk '{printf "%-25s %s\n", $1, $2}'
3.4 Configure kubectl Access
|
k3s writes kubeconfig to |
Copy kubeconfig to home directory:
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
chmod 600 ~/.kube/config
Set KUBECONFIG (required):
export KUBECONFIG=~/.kube/config
Make it permanent:
echo 'export KUBECONFIG=~/.kube/config' >> ~/.bashrc
The kubeconfig contains cluster admin credentials. Protect with 600 permissions.
|
Verify kubectl works:
kubectl get nodes
3.5 Enable kubectl Bash Completion
echo 'source <(kubectl completion bash)' >> ~/.bashrc
source ~/.bashrc
| Tab completion significantly accelerates kubectl usage. Works for resource names, namespaces, and flags. |
3.6 Store Node Token in gopass
The node token is needed for adding worker nodes later.
View token with line numbers (distinguished approach):
sudo awk '{print NR": "$0}' /var/lib/rancher/k3s/server/node-token
| Token is a single long line. Line numbers confirm it’s complete. |
View token length for verification:
sudo awk '{print "Length:", length($0)}' /var/lib/rancher/k3s/server/node-token
Store in gopass (on admin workstation):
gopass edit v3/domains/d000/k3s/node-token
| Never commit the node token to git. Anyone with this token can join nodes to your cluster. |
3.7 Export kubeconfig to gopass
Copy kubeconfig to admin workstation:
scp k3s-master-01:~/.kube/config /tmp/k3s-kubeconfig
View kubeconfig structure (first 20 lines):
awk 'NR <= 20 {print NR": "$0}' /tmp/k3s-kubeconfig
Store in gopass (on admin workstation):
gopass edit v3/domains/d000/k3s/kubeconfig
For multi-cluster setups, merge kubeconfigs into ~/.kube/config with distinct context names.
|
Phase 4: Vault Agent Integration
|
Vault Agent injects secrets into pods. This is REQUIRED infrastructure. Without Vault integration, secrets must be stored as Kubernetes Secrets (base64-encoded, not encrypted at rest without additional setup). |
| Step | Location | Action |
|---|---|---|
4.1 |
Workstation |
|
4.2 |
k3s-master-01 |
Create serviceaccount, clusterrolebinding, token, extract CA |
4.3 |
Workstation |
|
4.4 |
Workstation |
|
4.5 |
Workstation |
|
4.6.1 |
Workstation |
|
4.6.2-4 |
k3s-master-01 |
Helm install vault injector + create |
|
Critical gotchas:
|
4.1 Enable Vault Kubernetes Auth Method
From workstation (with dsource d000 dev/vault loaded):
vault auth enable kubernetes
| If already enabled, you’ll see "path is already in use". This is safe to ignore. |
Verify enabled:
vault auth list | grep kubernetes
kubernetes/ kubernetes auth_kubernetes_xxxx n/a
4.2 Create Kubernetes Auth Resources
On k3s-master-01 (SSH to the node):
Get the k3s API server CA:
kubectl config view --raw --minify --flatten -o jsonpath='{.clusters[].cluster.certificate-authority-data}' | base64 -d > /tmp/k3s-ca.crt
Verify CA certificate (structured):
openssl x509 -in /tmp/k3s-ca.crt -noout -subject -issuer -dates | awk -F'=' '
/Subject:/ {sub(/.*CN ?= ?/, ""); print "Subject: "$0}
/Issuer:/ {sub(/.*CN ?= ?/, ""); print "Issuer: "$0}
/notBefore/ {print "Valid From: "$2}
/notAfter/ {print "Valid Until: "$2}
'
Subject: k3s-master-01 Issuer: k3s-master-01 Valid From: Feb 21 00:00:00 2026 GMT Valid Until: Feb 19 00:00:00 2036 GMT
Create service account for Vault:
kubectl create serviceaccount vault-auth -n kube-system
Create ClusterRoleBinding:
kubectl create clusterrolebinding vault-auth-binding \
--clusterrole=system:auth-delegator \
--serviceaccount=kube-system:vault-auth
Get service account token (k8s 1.24+):
kubectl create token vault-auth -n kube-system --duration=8760h > /tmp/vault-auth-token
View token length (should be substantial):
awk '{print "Token length:", length($0)}' /tmp/vault-auth-token
| Token duration is 1 year (8760h). Set a calendar reminder to rotate before expiration. |
4.3 Configure Vault Kubernetes Auth
From workstation (with dsource d000 dev/vault loaded):
|
Files Copy from k3s-master-01 if not already done:
|
Configure kubernetes auth:
vault write auth/kubernetes/config \
kubernetes_host="https://10.50.1.120:6443" \
kubernetes_ca_cert=@/tmp/k3s-ca.crt \
token_reviewer_jwt=@/tmp/vault-auth-token
The @ prefix tells Vault to read the file contents.
|
4.4 Create Vault Policy for k3s
From workstation (with dsource d000 dev/vault loaded):
| Path | Capability |
|---|---|
|
read (application secrets) |
|
create, update (certificate issuance) |
vault policy write k3s-secrets - <<'EOF'
# Allow k3s pods to read secrets
path "kv/data/k3s/*" {
capabilities = ["read"]
}
# Allow PKI certificate issuance
path "pki_int/issue/domus-workstation" {
capabilities = ["create", "update"]
}
EOF
| Add paths incrementally as applications are deployed. Start minimal, expand as needed. |
4.5 Create Vault Role for k3s
From workstation (with dsource d000 dev/vault loaded):
| Parameter | Purpose | Value |
|---|---|---|
|
Which SAs can authenticate |
|
|
Which namespaces allowed |
default, production |
|
Vault policies to attach |
k3s-secrets |
|
Token lifetime |
1h |
vault write auth/kubernetes/role/k3s-app \
bound_service_account_names="*" \
bound_service_account_namespaces="default,production" \
policies="k3s-secrets" \
ttl="1h"
Using * for service account names is permissive. For production, specify explicit service account names.
|
4.6 Install Vault Agent Injector
4.6.1 Copy CA chain to k3s node
From workstation, copy the DOMUS CA chain:
scp /etc/ssl/certs/DOMUS-CA-CHAIN.pem k3s-master-01:/tmp/
|
The Vault Agent Injector must trust Vault’s TLS certificate. Without the CA bundle, pods will fail with:
|
4.6.2 Add Helm repo
On k3s-master-01:
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
4.6.3 Install injector with CA bundle
| Parameter | Purpose |
|---|---|
|
Enable sidecar injector (required) |
|
External Vault URL (not in-cluster) |
|
Don’t deploy Vault server (using external) |
|
Base64-encoded CA chain for TLS verification (REQUIRED) |
helm install vault hashicorp/vault \
--set "injector.enabled=true" \
--set "injector.externalVaultAddr=https://vault-01.inside.domusdigitalis.dev:8200" \
--set "server.enabled=false" \
--set "injector.certs.caBundle=$(base64 -w0 /tmp/DOMUS-CA-CHAIN.pem)"
caBundle provides the CA chain so the injector webhook trusts Vault’s TLS certificate. The base64 -w0 encodes without line breaks.
|
4.6.4 Create TLS secret for Vault Agent sidecars
|
The |
kubectl create secret generic vault-tls --from-file=ca.crt=/tmp/DOMUS-CA-CHAIN.pem
Pods must reference this secret via annotations (see Phase 5.2).
Verify injector pod (jq):
kubectl get pods -l app.kubernetes.io/name=vault-agent-injector -o json | jq -r '
.items[] | "\(.metadata.name): \(.status.phase) (Ready: \(.status.containerStatuses[0].ready))"
'
vault-agent-injector-xxxxx: Running (Ready: true)
Injector logs (filtered for errors):
kubectl logs -l app.kubernetes.io/name=vault-agent-injector --tail=50 | awk '/level=error|level=warn/ {print}'
| Empty output = no errors. |
Phase 5: Test Deployment
5.1 Create Test Secret in Vault
From workstation (with dsource d000 dev/vault loaded):
vault kv put kv/k3s/test username="testuser" password="testpass123"
Verify secret (jq structured output):
vault kv get -format=json kv/k3s/test | jq -r '
"Path: \(.request_id)",
"Version: \(.data.metadata.version)",
"Created: \(.data.metadata.created_time)",
"Keys: \(.data.data | keys | join(", "))"
'
Extract specific values:
vault kv get -format=json kv/k3s/test | jq -r '.data.data | to_entries[] | "\(.key)=\(.value)"'
password=testpass123 username=testuser
5.2 Deploy Test Pod with Vault Injection
| Annotation | Purpose |
|---|---|
|
Enable sidecar injection |
|
Vault role for authentication |
|
Secret path to inject into <file> |
|
Kubernetes secret containing CA cert |
|
Path to CA cert inside the secret volume |
|
For external Vault with TLS, the |
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: test-app
---
apiVersion: v1
kind: Pod
metadata:
name: vault-test
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "k3s-app"
vault.hashicorp.com/agent-inject-secret-credentials.txt: "kv/data/k3s/test"
vault.hashicorp.com/tls-secret: "vault-tls"
vault.hashicorp.com/ca-cert: "/vault/tls/ca.crt"
spec:
serviceAccountName: test-app
containers:
- name: app
image: busybox
command: ["sh", "-c", "while true; do sleep 3600; done"]
EOF
Changed from cat /vault/secrets/credentials.txt && sleep 3600 to infinite sleep loop. We’ll inspect manually.
|
5.3 Verify Secret Injection
Wait for pod with timeout:
kubectl wait --for=condition=Ready pod/vault-test --timeout=120s && echo "Pod ready" || echo "Timeout - check events"
Pod status deep dive (jq):
kubectl get pod vault-test -o json | jq -r '
"Phase: \(.status.phase)",
"Containers: \(.spec.containers | length) app + \(.spec.initContainers // [] | length) init",
"Vault Injected: \(.metadata.annotations["vault.hashicorp.com/agent-inject"] // "false")",
(.status.containerStatuses[]? | " \(.name): \(.ready) (restarts: \(.restartCount))")
'
View injected secret (structured):
kubectl exec vault-test -- sh -c 'cat /vault/secrets/credentials.txt' | awk -F'=' '{printf "%-15s %s\n", $1":", $2}'
username: testuser password: testpass123
Vault agent init logs (filtered):
kubectl logs vault-test -c vault-agent-init 2>/dev/null | awk '
/level=info.*secret.*rendered/ {print "✓ Secret rendered"}
/level=error/ {print "✗ ERROR: "$0}
END {if(NR==0) print "No init container logs (may have completed)"}
'
Full injection verification:
kubectl exec vault-test -- sh -c '
echo "=== Secret File ===" && ls -la /vault/secrets/ && \
echo -e "\n=== Contents ===" && cat /vault/secrets/credentials.txt && \
echo -e "\n=== Permissions ===" && stat -c "%a %U:%G" /vault/secrets/credentials.txt
'
Phase 6: DNS Registration
Phase 7: Verification
7.1 Automated Health Check
Run all checks at once:
echo "=== k3s Cluster Health Check ===" && echo
# k3s service
printf "%-20s" "k3s service:"
systemctl is-active k3s 2>/dev/null | awk '{print ($1=="active") ? "✓ "$1 : "✗ "$1}'
# SELinux
printf "%-20s" "SELinux:"
getenforce | awk '{print ($1=="Enforcing") ? "✓ "$1 : "✗ "$1}'
# Firewall
printf "%-20s" "firewalld:"
systemctl is-active firewalld 2>/dev/null | awk '{print ($1=="active") ? "✓ "$1 : "✗ "$1}'
# Node status
printf "%-20s" "Node Ready:"
kubectl get nodes -o json 2>/dev/null | jq -r '.items[0].status.conditions[] | select(.type=="Ready") | if .status=="True" then "✓ Ready" else "✗ NotReady" end'
# Cilium
printf "%-20s" "Cilium:"
cilium status --output json 2>/dev/null | jq -r 'if .cilium.state=="Ok" then "✓ Ok" else "✗ \(.cilium.state)" end' || echo "✗ not installed"
# Hubble
printf "%-20s" "Hubble:"
cilium status --output json 2>/dev/null | jq -r 'if .hubble.state then "✓ \(.hubble.state)" else "○ disabled" end' || echo "○ N/A"
# Vault injector
printf "%-20s" "Vault Injector:"
kubectl get pods -l app.kubernetes.io/name=vault-agent-injector -o json 2>/dev/null | jq -r '.items[0].status.phase // "not found"' | awk '{print ($1=="Running") ? "✓ "$1 : "✗ "$1}'
# Pod summary
echo -e "\n=== Pod Summary ==="
kubectl get pods -A --no-headers 2>/dev/null | awk '
{status[$4]++; total++}
END {
for(s in status) printf "%-15s %d\n", s":", status[s]
print "---"
printf "%-15s %d\n", "Total:", total
}'
# Unhealthy pods
echo -e "\n=== Unhealthy Pods ==="
kubectl get pods -A --no-headers 2>/dev/null | awk '$4!="Running" && $4!="Completed" {print $1"/"$2": "$4}' || echo "None"
7.2 Component Deep Dive
k3s service (structured):
systemctl show k3s --property=ActiveState,SubState,MainPID,MemoryCurrent | awk -F= '{
if($1=="MemoryCurrent") printf "%-15s %.1f MB\n", $1":", $2/1024/1024
else printf "%-15s %s\n", $1":", $2
}'
Journal errors (last 24h, deduplicated):
sudo journalctl -u k3s --since "24 hours ago" -p err --no-pager | awk '
!seen[$0]++ {print NR": "$0}
END {if(NR==0) print "No errors in last 24h"}
' | head -20
Cilium connectivity matrix:
cilium status --output json 2>/dev/null | jq -r '
"Cilium: \(.cilium.state)",
"Operator: \(.operator.state)",
"Hubble: \(.hubble.state // "disabled")",
"ClusterMesh: \(.cluster_mesh.state // "disabled")"
'
7.3 Quick Status Commands with awk
Check all pods with status extraction:
kubectl get pods -A | awk 'NR==1 || /Running|Pending|Error/'
Count pods by status:
kubectl get pods -A --no-headers | awk '{count[$4]++} END {for(s in count) print s, count[s]}'
Running 8 Completed 2
List non-Running pods only:
kubectl get pods -A | awk 'NR==1 || !/Running/'
|
awk pattern breakdown:
|
Extract node resources:
kubectl top nodes 2>/dev/null | awk '{print $1, $3, $5}'
Count pods per node:
kubectl get pods -A -o wide --no-headers | awk '{count[$8]++} END {for(n in count) print n, count[n]}'
Troubleshooting
Migrating Existing k3s from Flannel to Cilium
If k3s was installed with default Flannel CNI and you need to migrate to Cilium:
|
This is a destructive operation. All pods will be recreated. Plan for downtime. Option A (Recommended): Reinstall k3s fresh with Cilium Option B: In-place migration (complex, higher risk) |
Option A: Clean Reinstall (Recommended for single-node)
# 1. Backup any persistent data
kubectl get pv -o yaml > /tmp/pv-backup.yaml
kubectl get pvc -A -o yaml > /tmp/pvc-backup.yaml
# 2. Uninstall k3s completely
/usr/local/bin/k3s-uninstall.sh
# 3. Reinstall with Cilium flags (Phase 3.1)
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_EXEC="server \
--selinux \
--flannel-backend=none \
--disable-network-policy" sh -
# 4. Install Cilium (Phase 3.2)
# ... follow Phase 3.2 steps
Option B: In-Place Migration (Advanced)
# 1. Install Cilium alongside Flannel
cilium install --version 1.16.5
# 2. Wait for Cilium to be ready
cilium status --wait
# 3. Verify pods have Cilium networking
kubectl get pods -A -o wide
# 4. Delete Flannel (after Cilium verified working)
kubectl delete -n kube-system daemonset kube-flannel-ds 2>/dev/null || true
# 5. Remove Flannel CNI config
sudo rm -f /etc/cni/net.d/10-flannel.conflist
# 6. Restart k3s
sudo systemctl restart k3s
| In-place migration may leave orphaned network resources. Clean reinstall is cleaner. |
SELinux Denials
Check for AVC denials:
sudo ausearch -m avc -ts recent
Extract denied operations with awk:
sudo ausearch -m avc -ts recent | awk '/denied/{print}'
If k3s has SELinux issues:
sudo ausearch -c 'k3s' --raw | audit2allow -M k3s-selinux
sudo semodule -i k3s-selinux.pp
| Only create SELinux policy modules for legitimate denials. Review audit2allow output before applying. |
k3s Won’t Start
View journal entries with error filtering:
sudo journalctl -xeu k3s --no-pager -p err | awk 'NR <= 30'
View full context (last 100 lines):
sudo journalctl -xeu k3s --no-pager | awk 'NR > (NR-100)'
Vault Agent Injection Not Working
Check injector logs:
kubectl logs -l app.kubernetes.io/name=vault-agent-injector
Filter for errors with awk:
kubectl logs -l app.kubernetes.io/name=vault-agent-injector | awk '/error|Error|ERROR/'
Check pod events:
kubectl describe pod <pod-name> | awk '/Events:/,0'
awk '/pattern/,0' prints from pattern match to end of file.
|
Verify Vault connectivity from k3s:
kubectl run vault-test --rm -it --image=alpine -- wget --ca-certificate=/etc/ssl/certs/ca-certificates.crt -qO- https://vault-01.inside.domusdigitalis.dev:8200/v1/sys/health
Parse health check response:
kubectl run vault-test --rm -it --image=alpine -- wget --ca-certificate=/etc/ssl/certs/ca-certificates.crt -qO- https://vault-01.inside.domusdigitalis.dev:8200/v1/sys/health 2>/dev/null | jq -r '.sealed, .initialized'
If CA chain not in Alpine trust store, use --no-check-certificate for testing only.
|
SSH Breaks After Cilium Install/Upgrade
|
Symptom: SSH to VM times out after Cilium installation or upgrade, but ping works. Root cause: Cilium 1.19.x has kubeProxyReplacement incompatibility with k3s. The eBPF programs block TCP traffic. |
Quick fix (if locked out):
-
Access VM via virsh console:
sudo virsh console k3s-master-01 -
Reboot to clear eBPF state:
sudo reboot -
Immediately uninstall Cilium:
helm uninstall cilium -n kube-system -
Reinstall with 1.16.5 LTS:
helm install cilium cilium/cilium --version 1.16.5 \ --namespace kube-system \ --set kubeProxyReplacement=true \ --set k8sServiceHost=127.0.0.1 \ --set k8sServicePort=6443 \ --set operator.replicas=1 \ --set hubble.enabled=true \ --set hubble.relay.enabled=false \ --set hubble.ui.enabled=false
Prevention: Always use Cilium 1.16.x LTS with k3s. Do NOT use 1.19.x.
CiliumNode Stale IP After Static IP Fix
|
Root cause: If you install Cilium while the VM has a DHCP address, then fix to static IP later, Cilium caches the old DHCP IP in the CiliumNode resource. Symptom: |
Diagnose:
# Check what Cilium thinks the node IP is
kubectl get ciliumnode
# Compare to actual node IP
ip -4 addr show eth0 | awk '/inet/{print $2}'
If CiliumNode shows wrong IP (e.g., old DHCP address):
Fix:
# Delete stale CiliumNode (will recreate with correct IP)
kubectl delete ciliumnode $(hostname -f)
# Restart Cilium to pick up correct IP
kubectl rollout restart daemonset/cilium -n kube-system
kubectl rollout status daemonset/cilium -n kube-system
# Verify correct IP
kubectl get ciliumnode
Prevention: Always fix static IP (Phase 2.1.1) BEFORE installing k3s/Cilium.
Containerd Snapshotter Corruption
|
Symptom: Pod fails with Root cause: Containerd’s overlayfs snapshotter has corrupted layer metadata. Re-pulling images doesn’t fix it because snapshotter reuses cached (corrupt) filesystem. |
Diagnose:
# Check binary inside image
sudo mkdir -p /mnt/test-image
sudo /usr/local/bin/k3s ctr images mount <image>@<digest> /mnt/test-image
file /mnt/test-image/usr/bin/<binary>
sudo umount /mnt/test-image
If binary shows empty:
Fix (Nuclear - re-pulls ALL images):
# Stop k3s
sudo systemctl stop k3s
# Remove corrupted snapshotter state
sudo rm -rf /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*
sudo rm -f /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/metadata.db
# Start k3s (will re-pull all images fresh)
sudo systemctl start k3s
# Wait for cluster to stabilize
sleep 60
kubectl get pods -A
| This re-pulls ALL container images. Plan for 5-10 minutes downtime. |
k3s ctr Command Not Found
k3s bundles containerd with its own ctr command. It’s not in PATH for sudo.
Wrong:
sudo ctr images ls
# Error: sudo: ctr: command not found
Correct:
sudo /usr/local/bin/k3s ctr images ls
Common k3s ctr commands:
# List images
sudo /usr/local/bin/k3s ctr images ls
# Remove image
sudo /usr/local/bin/k3s ctr images rm <image>
# Pull image with specific platform
sudo /usr/local/bin/k3s ctr images pull --platform linux/amd64 <image>
# List containers
sudo /usr/local/bin/k3s ctr containers ls
# Check content store
sudo /usr/local/bin/k3s ctr content ls
Helm Upgrade Fails with nil pointer
Symptom:
Error: UPGRADE FAILED: template: cilium/templates/xxx.yaml:1:14: nil pointer evaluating interface \{}.enabled
Root cause: --reuse-values pulls old values missing new required fields in updated chart.
Fix: Export current values and upgrade with explicit values file:
# Export current values
helm get values cilium -n kube-system -o yaml > /tmp/cilium-values.yaml
# Review and add any missing required fields
cat /tmp/cilium-values.yaml
# Upgrade with explicit values
helm upgrade cilium cilium/cilium -n kube-system -f /tmp/cilium-values.yaml
Or use chart defaults with minimal overrides (see Phase 3.2.4 for production values file).
Next Steps
After successful deployment:
-
Expand to 6-node HA cluster - Deploy k3s-master-02/03, k3s-worker-01/02/03
-
Configure Cilium BGP - Advertise LoadBalancer IPs to VyOS router
-
Deploy monitoring - Prometheus + Grafana on k3s
-
Configure GitOps - ArgoCD for declarative deployments
-
Setup Traefik IngressRoute - HTTPS with Vault certificates
Related Documentation
-
Vault Enterprise Hardening (infra-ops roadmap)
-
Vault SSH CA (infra-ops runbook)
-
Container Operations Roadmap (infra-ops roadmap)
Appendix A: Deployment Chronicle
2026-02-22: Vault Agent Injector Installation
LAST DEPLOYED: Sun Feb 22 10:35:29 2026
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing HashiCorp Vault!
Now that you have deployed Vault, you should look over the docs on using
Vault with Kubernetes available here:
https://developer.hashicorp.com/vault/docs
Your release is named vault. To learn more about the release, try:
$ helm status vault
$ helm get manifest vault
Helm install command used:
helm install vault hashicorp/vault \
--set "injector.enabled=true" \
--set "injector.externalVaultAddr=https://vault-01.inside.domusdigitalis.dev:8200" \
--set "server.enabled=false"
2026-02-22: Vault Agent TLS Fix
Problem: Vault Agent sidecars failed with x509: certificate signed by unknown authority.
Root cause: injector.certs.caBundle only affects the injector webhook, NOT the vault-agent sidecars.
Solution:
-
Create TLS secret with CA cert:
kubectl create secret generic vault-tls --from-file=ca.crt=/tmp/DOMUS-CA-CHAIN.pem -
Add annotations to pods:
vault.hashicorp.com/tls-secret: "vault-tls" vault.hashicorp.com/ca-cert: "/vault/tls/ca.crt"
Verification:
kubectl exec vault-test -c app -- cat /vault/secrets/credentials.txt
data: map[password:testpass123 username:testuser]
metadata: map[created_time:2026-02-22T10:38:41.572336046Z ...]