k3s Kubernetes Deployment

Single-node k3s Kubernetes deployment on Rocky Linux 9 with SELinux enforcing and Vault Agent for secrets injection.

Overview

Component Value

Node

k3s-master-01 (10.50.1.120)

OS

Rocky Linux 9 (RHEL 9 family)

SELinux

Enforcing (required)

Host Firewall

firewalld (RHEL standard, nftables backend)

CNI / Network Policy

Cilium (eBPF-based, L3-L7 policies, replaces Flannel)

Container Runtime

containerd (k3s embedded)

Secrets

HashiCorp Vault Agent (required)

Ingress

Traefik (k3s default)

Defense-in-Depth Security Posture:

  • Layer 1 (Host): firewalld - OS-level port filtering, SELinux integration

  • Layer 2 (Network): Cilium - Pod microsegmentation, identity-based policies, L7 visibility

  • Layer 3 (Secrets): Vault Agent - No hardcoded credentials, dynamic secrets

  • Layer 4 (Runtime): SELinux Enforcing - Mandatory access control

Zero-trust means securing at EVERY layer, not just the perimeter.

Architecture

k3s Defense-in-Depth Stack

Session Variables

Set these variables at the start of your deployment session. Adjust for each node being deployed.

Shell Variables (Copy to Terminal)

# ============================================================
# K3S DEPLOYMENT SESSION VARIABLES
# Adjust K3S_NODE_* for each node deployment
# ============================================================

# Target Node (change for each deployment)
K3S_NODE_NAME="k3s-master-01"
K3S_NODE_IP="10.50.1.120"
K3S_NODE_ROLE="server"        # server=master, agent=worker

# Hypervisor (kvm-01 for master-01/worker-01, kvm-02 for others)
KVM_HOST="kvm-01"

# Cluster Configuration
K3S_TOKEN="<generated-token>"  # Generate once, use for all nodes
K3S_SERVER_URL="https://10.50.1.120:6443"  # For joining nodes

# Network Configuration
DOMAIN="inside.domusdigitalis.dev"
GATEWAY="10.50.1.1"           # VyOS HA VIP
DNS_PRIMARY="10.50.1.90"
DNS_SECONDARY="10.50.1.91"

# Storage
NAS_IP="10.50.1.70"
NFS_SHARE="/volume1/k3s"

# Vault Integration
VAULT_ADDR="https://vault-01.inside.domusdigitalis.dev:8200"

Node Matrix Reference

Node IP Role Hypervisor Priority

k3s-master-01

10.50.1.120

server (etcd)

kvm-01

First

k3s-master-02

10.50.1.121

server (etcd)

kvm-02

Second

k3s-master-03

10.50.1.122

server (etcd)

kvm-02

Third

k3s-worker-01

10.50.1.123

agent

kvm-01

After masters

k3s-worker-02

10.50.1.124

agent

kvm-02

After masters

k3s-worker-03

10.50.1.125

agent

kvm-02

After masters

Verify Variables

echo "=== k3s Node Deployment ==="
echo "Node:       $K3S_NODE_NAME"
echo "IP:         $K3S_NODE_IP"
echo "Role:       $K3S_NODE_ROLE"
echo "Hypervisor: $KVM_HOST"
echo "Gateway:    $GATEWAY"
echo "DNS:        $DNS_PRIMARY"

Gateway: VyOS HA VIP 10.50.1.1 is the default gateway since 2026-03-07. See VyOS Deployment for HA architecture.

Phase 1: VM Creation

Faster deployment using Rocky 9 GenericCloud image with cloud-init.

1.1 Copy and resize base image:

cd /var/lib/libvirt/images
sudo cp Rocky-9-GenericCloud-Base.latest.x86_64.qcow2 k3s-master-01.qcow2
sudo qemu-img resize k3s-master-01.qcow2 50G

1.2 Create cloud-init configuration:

The heredoc must have NO leading whitespace. The #cloud-config line must start at column 0.
plain_text_passwd is required for console access (VNC/serial). SSH keys alone won’t help if network fails. Change password after first login with passwd.
cat > /tmp/k3s-cloud-init.yml << 'EOF'
#cloud-config
hostname: k3s-master-01
fqdn: k3s-master-01.inside.domusdigitalis.dev
manage_etc_hosts: true

users:
  - name: evanusmodestus
    groups: wheel
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    lock_passwd: false
    plain_text_passwd: changeme123
    ssh_authorized_keys:
      - ssh-ed25519 AAAAC3... # Add your SSH public key
      - sk-ssh-ed25519@openssh.com AAAAG... # YubiKey resident key (optional)

write_files:
  - path: /etc/NetworkManager/system-connections/eth0.nmconnection
    permissions: '0600'
    content: |
      [connection]
      id=eth0
      type=ethernet
      interface-name=eth0
      autoconnect=true

      [ipv4]
      method=manual
      addresses=10.50.1.120/24
      gateway=10.50.1.1
      dns=10.50.1.90;10.50.1.91

      [ipv6]
      method=disabled

runcmd:
  - nmcli connection reload
  - nmcli connection up eth0
  - growpart /dev/vda 4
  - xfs_growfs /
EOF

1.2.1 Verify cloud-init YAML (no leading whitespace):

awk 'NR<=5 {print NR": ["$0"]"}' /tmp/k3s-cloud-init.yml
Expected output (no spaces after [)
1: [#cloud-config]
2: [hostname: k3s-master-01]
3: [fqdn: k3s-master-01.inside.domusdigitalis.dev]
4: [manage_etc_hosts: true]
5: []

If there’s leading whitespace, fix with:

sed -i 's/^  //' /tmp/k3s-cloud-init.yml

1.3 Create cloud-init ISO:

Requires cloud-utils package. On Arch: sudo pacman -S cloud-utils
sudo cloud-localds /var/lib/libvirt/images/k3s-cloud-init.iso /tmp/k3s-cloud-init.yml

1.4 Create VM with virt-install:

sudo virt-install \
  --name k3s-master-01 \
  --memory 4096 \
  --vcpus 2 \
  --disk path=/var/lib/libvirt/images/k3s-master-01.qcow2,format=qcow2 \
  --disk path=/var/lib/libvirt/images/k3s-cloud-init.iso,device=cdrom \
  --os-variant rocky9 \
  --network bridge=virbr0,model=virtio \
  --graphics none \
  --console pty,target_type=serial \
  --import \
  --noautoconsole

1.5 Verify VM started:

sudo virsh list | grep k3s

Option B: ISO Installation (Alternative)

Traditional installation from ISO with kickstart.

virt-install \
  --name k3s-master-01 \
  --memory 4096 \
  --vcpus 2 \
  --disk path=/var/lib/libvirt/images/k3s-master-01.qcow2,size=50,format=qcow2 \
  --os-variant rocky9 \
  --network bridge=virbr0,model=virtio \
  --graphics none \
  --console pty,target_type=serial \
  --location /var/lib/libvirt/iso/Rocky-9-latest-x86_64-minimal.iso \
  --extra-args "inst.ks=http://10.50.1.110/ks/rocky9-minimal.cfg ip=10.50.1.120::10.50.1.1:255.255.255.0:k3s-master-01.inside.domusdigitalis.dev:enp1s0:none nameserver=10.50.1.90"
The --extra-args string configures static IP during installation. Format: ip=<ip>::<gateway>:<netmask>:<hostname>:<interface>:none.

Manual Installation Settings (if no kickstart)

During Rocky 9 installation:

Setting Value

Hostname

k3s-master-01.inside.domusdigitalis.dev

IP Address

10.50.1.120/24

Gateway

10.50.1.1

DNS

10.50.1.90

Root Password

(from gopass: v3/domains/d000/k3s/k3s-master-01)

User

sysadmin (with sudo)

Partitioning

Automatic with LVM

Store root password in gopass BEFORE installation. Never use weak passwords on infrastructure nodes.

Phase 2: Base Configuration

2.1 Verify Network

System overview (filtered):

hostnamectl | awk '/Static hostname|Operating System|Kernel|Architecture/ {gsub(/^[[:space:]]+/, ""); print}'
Expected output
Static hostname: k3s-master-01
Operating System: Rocky Linux 9.7 (Blue Onyx)
Kernel: Linux 5.14.0-611.5.1.el9_7.x86_64
Architecture: x86-64

Network configuration summary:

ip -4 -o addr show | awk '{split($4,a,"/"); print $2": "$4" (scope:"$NF")"}' | grep -v "^lo:"
Expected output
eth0: 10.50.1.120/24 (scope:global)

DNS and gateway verification:

awk '/^nameserver/{print "DNS: "$2} /^search/{print "Search: "$2}' /etc/resolv.conf

Connectivity matrix (parallel checks):

for target in 10.50.1.1 10.50.1.60 10.50.1.90; do
  ping -c1 -W1 $target &>/dev/null && echo "$target: OK" || echo "$target: FAIL"
done
Expected output
{vyos-vip}: OK     # VyOS HA VIP gateway
{vault-ip}: OK     # Vault
{bind-ip}: OK      # bind-01 DNS

DNS resolution test:

for host in vault.inside.domusdigitalis.dev k3s-master-01.inside.domusdigitalis.dev; do
  dig +short $host | awk -v h="$host" 'NR==1 {print h": "$0}'
done

2.1.1 Fix Static IP (If DHCP Active)

Rocky 9 GenericCloud DHCP Issue: Cloud-init network config is unreliable. The VM often gets DHCP instead of static IP. Additionally, VM restarts (shutdown/start, migration) can revert to DHCP via "System eth0" connection.

Check if DHCP is active:

nmcli -t -f NAME,DEVICE conn show --active | awk -F: '{print $1": "$2}'

If you see "System eth0" or IP is not 10.50.1.120, fix it:

Delete DHCP connections:

sudo nmcli conn delete "System eth0" 2>/dev/null
sudo nmcli conn delete "Wired connection 1" 2>/dev/null
sudo nmcli conn delete "cloud-init eth0" 2>/dev/null

Create persistent static connection:

sudo nmcli conn add con-name eth0 type ethernet ifname eth0 \
  ipv4.method manual \
  ipv4.addresses 10.50.1.120/24 \
  ipv4.gateway 10.50.1.1 \
  ipv4.dns "10.50.1.90,10.50.1.91" \
  autoconnect yes

Activate (will disconnect if remote):

sudo nmcli conn up eth0

Reconnect to static IP:

ssh evanusmodestus@10.50.1.120

Verify static IP persists:

ip -4 -o addr show eth0 | awk '{print $4}'
Expected output
10.50.1.120/24

2.2 Update System

sudo dnf update -y
sudo reboot
Always reboot after kernel updates to ensure the new kernel is active.

2.3 Verify SELinux

getenforce

Expected: Enforcing

Extract SELinux mode with awk:

sestatus | awk '/Current mode/{print $3}'

SELinux must remain in Enforcing mode. Do NOT set to Permissive.

k3s fully supports SELinux when installed with --selinux flag. Disabling SELinux removes a critical security layer.

2.4 Configure Firewall

GenericCloud images are minimal - firewalld is NOT installed by default.

Host firewall is mandatory for defense-in-depth. Do NOT rely on perimeter firewall alone.

2.4.1 Install firewalld (GenericCloud images):

sudo dnf install -y firewalld
sudo systemctl enable --now firewalld

Verify service running:

systemctl status firewalld | awk '/Active:/{print $2, $3}'

2.4.2 Open required ports:

Table 1. k3s + Cilium Required Ports
Port Protocol Purpose

6443

TCP

k3s API server (kubectl access)

80

TCP

Traefik HTTP ingress

443

TCP

Traefik HTTPS ingress

4240

TCP

Cilium health checks

4244

TCP

Hubble Relay (Cilium observability)

4245

TCP

Hubble UI (optional)

8472

UDP

VXLAN overlay (Cilium/Flannel)

10250

TCP

Kubelet metrics

sudo firewall-cmd --permanent --add-port=6443/tcp
sudo firewall-cmd --permanent --add-port=80/tcp
sudo firewall-cmd --permanent --add-port=443/tcp
sudo firewall-cmd --permanent --add-port=4240/tcp
sudo firewall-cmd --permanent --add-port=4244/tcp
sudo firewall-cmd --permanent --add-port=8472/udp
sudo firewall-cmd --permanent --add-port=10250/tcp
sudo firewall-cmd --reload

Verify with awk extraction:

sudo firewall-cmd --list-ports | awk '{gsub(/ /,"\n"); print}' | sort
Expected output
10250/tcp
4240/tcp
4244/tcp
443/tcp
6443/tcp
80/tcp
8472/udp

2.5 Install Prerequisites

sudo dnf install -y curl wget tar git jq bash-completion
jq is essential for parsing JSON from kubectl and Vault. bash-completion enables kubectl tab completion.

2.6 Configure Shell (History + Completions)

Increase bash history (default 500 is too low):

cat >> ~/.bashrc << 'EOF'

# History configuration
HISTSIZE=10000
HISTFILESIZE=20000
HISTCONTROL=ignoreboth:erasedups
shopt -s histappend
EOF

Reload:

source ~/.bashrc
Completions for kubectl and cilium are added after their installation in Phase 3.

2.7 Configure Vault SSH CA Trust

Enable Vault SSH certificate authentication for secure access.

Download Vault CA public key:

curl -sSk https://10.50.1.60:8200/v1/ssh/public_key | sudo tee /etc/ssh/trusted-user-ca-keys.pem

Verify download (first 50 chars):

sudo awk '{print substr($0,1,50); exit}' /etc/ssh/trusted-user-ca-keys.pem
Expected output
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQCdUEnm7yqL

Add TrustedUserCAKeys to sshd_config:

echo 'TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem' | sudo tee -a /etc/ssh/sshd_config

Verify placement (must be BEFORE any Match blocks):

sudo awk '/TrustedUserCAKeys|^Match/' /etc/ssh/sshd_config

Restart sshd:

sudo systemctl restart sshd

Test from workstation:

ssh k3s-master-01 hostname

Phase 3: k3s Installation

3.1 Install k3s with SELinux Support (No Default CNI)

SELinux: The --selinux flag is required for SELinux enforcing mode.

Cilium: We disable k3s default networking (--flannel-backend=none --disable-network-policy) and install Cilium separately for enterprise-grade network policies.

curl -sfL https://get.k3s.io | sudo INSTALL_K3S_EXEC="server \
  --selinux \
  --flannel-backend=none \
  --disable-network-policy" sh -
Table 2. k3s Installation Flags
Flag Purpose

server

Run as control-plane node (not agent)

--selinux

Enable SELinux policy module

--flannel-backend=none

Disable default Flannel CNI (Cilium will provide networking)

--disable-network-policy

Disable k3s native NetworkPolicy controller (Cilium handles this)

After this step, the node will show NotReady until Cilium is installed. This is expected - the cluster has no CNI to provide pod networking.

Networking Concepts (CCNP Context)

If you have traditional networking background (CCNP, etc.), here’s how Kubernetes networking maps:

Table 3. Traditional vs Kubernetes Networking
Traditional Kubernetes Explanation

VLAN segmentation

Namespaces + NetworkPolicy

Logical isolation without L2 boundaries

VXLAN overlay (EVPN)

CNI overlay (Cilium/Flannel)

Same encapsulation - pods get virtual L3 network abstracted from underlay

VRF / Route tables

Network namespaces

Each pod has isolated network stack

ACLs (L3/L4)

NetworkPolicy

Permit/deny by label selector, port, protocol

NBAR / AVC (L7)

Cilium L7 Policy

HTTP method, path, headers - identity-aware

NAT/PAT

SNAT (egress), DNAT (Services)

kube-proxy or Cilium handles translation

Load balancer (F5, Netscaler)

Service (ClusterIP, LoadBalancer)

Round-robin by default, session affinity optional

Packet capture (SPAN)

Hubble

Real-time flow visibility with identity context

Key abstraction: The overlay network means pods communicate on a flat L3 network (typically 10.42.0.0/16) regardless of the physical underlay. No VLAN trunking, no inter-VLAN routing, no VPN tunnels needed between nodes. The CNI handles encapsulation transparently.

Why this matters:

  • Pod on Node A (10.50.1.120) can reach Pod on Node B (10.50.1.121) directly

  • Physical network only needs IP connectivity between nodes

  • Network policies are declared (YAML), not configured on switches

  • Security follows workload identity, not IP addresses

3.2 Install Cilium CNI (Helm Method)

Cilium provides eBPF-based networking with L3-L7 network policies, identity-aware security, and observability via Hubble.

Why Helm over cilium-cli:

  • Enterprise standard - production k8s deployments use Helm or GitOps

  • Declarative - values.yaml is version-controlled and auditable

  • Repeatable - same chart + values = identical deployment

  • Upgrades - helm upgrade with rollback capability

  • GitOps ready - ArgoCD/Flux sync Helm releases

The cilium CLI is useful for status/debugging but Helm is the production deployment method.

3.2.1 Install Helm:

curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version

3.2.2 Install Cilium CLI (for status/debugging):

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

3.2.3 Add Cilium Helm repository:

helm repo add cilium https://helm.cilium.io/
helm repo update

3.2.4 Create Cilium values file:

In production, this file lives in git: infrastructure/k8s/cilium/values.yaml

cat > /tmp/cilium-values.yaml << 'EOF'
# Cilium Helm Values - Production
# k3s single-node deployment

cluster:
  name: domus-k3s

# k3s API server (internal)
k8sServiceHost: 127.0.0.1
k8sServicePort: 6443

# Replace kube-proxy entirely (eBPF-native)
kubeProxyReplacement: true

# Network mode
routingMode: tunnel
tunnelProtocol: vxlan

# Operator HA (single node = 1, multi-node = 2)
operator:
  replicas: 1

# Hubble observability (headless - CLI only)
hubble:
  enabled: true
  relay:
    enabled: true
  # UI disabled - headless server, use `hubble observe` CLI
  ui:
    enabled: false

# Security hardening
securityContext:
  capabilities:
    ciliumAgent:
      - CHOWN
      - KILL
      - NET_ADMIN
      - NET_RAW
      - IPC_LOCK
      - SYS_ADMIN
      - SYS_RESOURCE
      - DAC_OVERRIDE
      - FOWNER
      - SETGID
      - SETUID
    cleanCiliumState:
      - NET_ADMIN
      - SYS_ADMIN
      - SYS_RESOURCE
EOF

3.2.5 Install Cilium via Helm:

helm install cilium cilium/cilium -n kube-system -f /tmp/cilium-values.yaml

Use Cilium 1.16.x LTS - NOT 1.19.x

Cilium 1.19.x has incompatibilities with k3s kubeProxyReplacement that break SSH connectivity. Install 1.16.5 (LTS):

helm install cilium cilium/cilium --version 1.16.5 -n kube-system -f /tmp/cilium-values.yaml

3.2.6 Wait for Cilium to be ready:

kubectl rollout status daemonset/cilium -n kube-system --timeout=300s
cilium status
Expected output
    /¯¯\
 /¯¯\__/¯¯\    Cilium:             OK
 \__/¯¯\__/    Operator:           OK
 /¯¯\__/¯¯\    Envoy DaemonSet:    OK
 \__/¯¯\__/    Hubble Relay:       OK
    \__/       ClusterMesh:        disabled

Fallback: Verify via kubectl:

kubectl get pods -n kube-system -l k8s-app=cilium -o custom-columns='NAME:.metadata.name,STATUS:.status.phase,READY:.status.containerStatuses[0].ready'
Expected output
NAME           STATUS    READY
cilium-xxxxx   Running   true

3.2.7 Verify node transitions to Ready:

kubectl get nodes -o custom-columns='NAME:.metadata.name,STATUS:.status.conditions[-1].type,VERSION:.status.nodeInfo.kubeletVersion'
Expected output
NAME                                        STATUS   VERSION
k3s-master-01.inside.domusdigitalis.dev     Ready    v1.34.x+k3s1
Kernel warnings like ip_set_init will not be maintained are informational. Cilium uses eBPF, not iptables ip_set.

3.2.8 Verify Hubble Relay:

kubectl get pods -n kube-system -l app.kubernetes.io/name=hubble-relay -o custom-columns='NAME:.metadata.name,STATUS:.status.phase,READY:.status.containerStatuses[0].ready'
Expected output
NAME                  STATUS    READY
hubble-relay-xxxxx    Running   true

3.2.9 Test Hubble CLI:

# Real-time flow visibility
hubble observe

# Filter by namespace
hubble observe -n kube-system

# Show only policy denials
hubble observe --verdict DROPPED

# DNS queries
hubble observe --protocol DNS
hubble observe is how you debug network policy issues in production. Better than any UI.

3.2.10 Run connectivity test:

cilium connectivity test --single-node
This creates test pods and validates network connectivity. Takes 5-10 minutes.

Expected results with Hubble Relay disabled:

  • ~90% pass rate is acceptable (110/121 tests)

  • L7 policy tests will fail (require Envoy/Hubble Relay)

  • VLAN filter drops expected (fix with --set vlanFilter.enabled=false)

To fix VLAN filter warnings:

helm upgrade cilium cilium/cilium --version 1.16.5 \
  --namespace kube-system \
  --reuse-values \
  --set devices='\\{eth0}' \
  --set vlanFilter.enabled=false

Clean up stale test namespaces before retry:

kubectl delete namespace cilium-test-1 cilium-test-ccnp1 cilium-test-ccnp2 --wait=false 2>/dev/null

3.2.7 Configure shell completions:

cilium completion bash | sudo tee /etc/bash_completion.d/cilium > /dev/null
kubectl completion bash | sudo tee /etc/bash_completion.d/kubectl > /dev/null

Reload completions:

source /etc/bash_completion.d/cilium
source /etc/bash_completion.d/kubectl

Or simply: exec bash

Now cilium conn<TAB> and kubectl get po<TAB> will autocomplete.

3.3 Verify Installation

Service status (structured):

systemctl show k3s --property=ActiveState,SubState,MainPID | awk -F= '{print $1": "$2}'
Expected output
ActiveState: active
SubState: running
MainPID: 1234

Node status with jq:

kubectl get nodes -o json | jq -r '.items[] | "\(.metadata.name): \(.status.conditions[] | select(.type=="Ready") | .status) (\(.status.nodeInfo.kubeletVersion))"'
Expected output
k3s-master-01: True (v1.31.4+k3s1)

Node capacity and allocatable (jq):

kubectl get nodes -o json | jq -r '.items[] | "CPU: \(.status.capacity.cpu) | Memory: \(.status.capacity.memory) | Pods: \(.status.capacity.pods)"'

Pod status matrix (awk pivot table):

kubectl get pods -A --no-headers | awk '
{
  ns[$1]++
  status[$4]++
  combo[$1","$4]++
}
END {
  print "=== By Namespace ==="
  for(n in ns) printf "%-20s %d\n", n, ns[n]
  print "\n=== By Status ==="
  for(s in status) printf "%-15s %d\n", s, status[s]
}'

Unhealthy pods only (jq filter):

kubectl get pods -A -o json | jq -r '.items[] | select(.status.phase != "Running" and .status.phase != "Succeeded") | "\(.metadata.namespace)/\(.metadata.name): \(.status.phase)"'
Empty output = all pods healthy.

Cilium + k3s health dashboard:

echo "=== k3s ===" && systemctl is-active k3s && \
echo -e "\n=== Cilium ===" && cilium status --output json 2>/dev/null | jq -r '"Cilium: \(.cilium.state)\nOperator: \(.operator.state)\nHubble: \(.hubble.state // "disabled")"' && \
echo -e "\n=== Nodes ===" && kubectl get nodes --no-headers | awk '{printf "%-25s %s\n", $1, $2}'

3.4 Configure kubectl Access

k3s writes kubeconfig to /etc/rancher/k3s/k3s.yaml with root-only permissions. You must copy it and set KUBECONFIG explicitly, or kubectl will fail with "permission denied".

Copy kubeconfig to home directory:

mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
chmod 600 ~/.kube/config

Set KUBECONFIG (required):

export KUBECONFIG=~/.kube/config

Make it permanent:

echo 'export KUBECONFIG=~/.kube/config' >> ~/.bashrc
The kubeconfig contains cluster admin credentials. Protect with 600 permissions.

Verify kubectl works:

kubectl get nodes

3.5 Enable kubectl Bash Completion

echo 'source <(kubectl completion bash)' >> ~/.bashrc
source ~/.bashrc
Tab completion significantly accelerates kubectl usage. Works for resource names, namespaces, and flags.

3.6 Store Node Token in gopass

The node token is needed for adding worker nodes later.

View token with line numbers (distinguished approach):

sudo awk '{print NR": "$0}' /var/lib/rancher/k3s/server/node-token
Token is a single long line. Line numbers confirm it’s complete.

View token length for verification:

sudo awk '{print "Length:", length($0)}' /var/lib/rancher/k3s/server/node-token

Store in gopass (on admin workstation):

gopass edit v3/domains/d000/k3s/node-token
Never commit the node token to git. Anyone with this token can join nodes to your cluster.

3.7 Export kubeconfig to gopass

Copy kubeconfig to admin workstation:

scp k3s-master-01:~/.kube/config /tmp/k3s-kubeconfig

View kubeconfig structure (first 20 lines):

awk 'NR <= 20 {print NR": "$0}' /tmp/k3s-kubeconfig

Store in gopass (on admin workstation):

gopass edit v3/domains/d000/k3s/kubeconfig
For multi-cluster setups, merge kubeconfigs into ~/.kube/config with distinct context names.

Phase 4: Vault Agent Integration

Vault Agent injects secrets into pods. This is REQUIRED infrastructure.

Without Vault integration, secrets must be stored as Kubernetes Secrets (base64-encoded, not encrypted at rest without additional setup).

Table 4. Phase 4 Workflow Summary
Step Location Action

4.1

Workstation

vault auth enable kubernetes

4.2

k3s-master-01

Create serviceaccount, clusterrolebinding, token, extract CA

4.3

Workstation

scp CA/token from k3s, then vault write auth/kubernetes/config

4.4

Workstation

vault policy write k3s-secrets (use kv/data/ not secret/data/)

4.5

Workstation

vault write auth/kubernetes/role/k3s-app

4.6.1

Workstation

scp DOMUS-CA-CHAIN.pem to k3s

4.6.2-4

k3s-master-01

Helm install vault injector + create vault-tls secret

Critical gotchas:

  1. KV path is kv/ not secret/ - check with vault secrets list

  2. Vault Agent TLS requires BOTH injector.certs.caBundle AND vault-tls secret

  3. Always run vault commands from workstation with dsource d000 dev/vault

4.1 Enable Vault Kubernetes Auth Method

From workstation (with dsource d000 dev/vault loaded):

vault auth enable kubernetes
If already enabled, you’ll see "path is already in use". This is safe to ignore.

Verify enabled:

vault auth list | grep kubernetes
Expected output
kubernetes/    kubernetes    auth_kubernetes_xxxx    n/a

4.2 Create Kubernetes Auth Resources

On k3s-master-01 (SSH to the node):

Get the k3s API server CA:

kubectl config view --raw --minify --flatten -o jsonpath='{.clusters[].cluster.certificate-authority-data}' | base64 -d > /tmp/k3s-ca.crt

Verify CA certificate (structured):

openssl x509 -in /tmp/k3s-ca.crt -noout -subject -issuer -dates | awk -F'=' '
/Subject:/ {sub(/.*CN ?= ?/, ""); print "Subject: "$0}
/Issuer:/  {sub(/.*CN ?= ?/, ""); print "Issuer:  "$0}
/notBefore/ {print "Valid From: "$2}
/notAfter/  {print "Valid Until: "$2}
'
Expected output
Subject: k3s-master-01
Issuer:  k3s-master-01
Valid From: Feb 21 00:00:00 2026 GMT
Valid Until: Feb 19 00:00:00 2036 GMT

Create service account for Vault:

kubectl create serviceaccount vault-auth -n kube-system

Create ClusterRoleBinding:

kubectl create clusterrolebinding vault-auth-binding \
  --clusterrole=system:auth-delegator \
  --serviceaccount=kube-system:vault-auth

Get service account token (k8s 1.24+):

kubectl create token vault-auth -n kube-system --duration=8760h > /tmp/vault-auth-token

View token length (should be substantial):

awk '{print "Token length:", length($0)}' /tmp/vault-auth-token
Token duration is 1 year (8760h). Set a calendar reminder to rotate before expiration.

4.3 Configure Vault Kubernetes Auth

From workstation (with dsource d000 dev/vault loaded):

Files /tmp/k3s-ca.crt and /tmp/vault-auth-token must exist on your workstation.

Copy from k3s-master-01 if not already done:

scp k3s-master-01:/tmp/k3s-ca.crt /tmp/
scp k3s-master-01:/tmp/vault-auth-token /tmp/

Configure kubernetes auth:

vault write auth/kubernetes/config \
  kubernetes_host="https://10.50.1.120:6443" \
  kubernetes_ca_cert=@/tmp/k3s-ca.crt \
  token_reviewer_jwt=@/tmp/vault-auth-token
The @ prefix tells Vault to read the file contents.

4.4 Create Vault Policy for k3s

From workstation (with dsource d000 dev/vault loaded):

Table 5. k3s Secrets Policy
Path Capability

kv/data/k3s/*

read (application secrets)

pki_int/issue/domus-workstation

create, update (certificate issuance)

vault policy write k3s-secrets - <<'EOF'
# Allow k3s pods to read secrets
path "kv/data/k3s/*" {
  capabilities = ["read"]
}

# Allow PKI certificate issuance
path "pki_int/issue/domus-workstation" {
  capabilities = ["create", "update"]
}
EOF
Add paths incrementally as applications are deployed. Start minimal, expand as needed.

4.5 Create Vault Role for k3s

From workstation (with dsource d000 dev/vault loaded):

Table 6. Vault Kubernetes Role Parameters
Parameter Purpose Value

bound_service_account_names

Which SAs can authenticate

* (any)

bound_service_account_namespaces

Which namespaces allowed

default, production

policies

Vault policies to attach

k3s-secrets

ttl

Token lifetime

1h

vault write auth/kubernetes/role/k3s-app \
  bound_service_account_names="*" \
  bound_service_account_namespaces="default,production" \
  policies="k3s-secrets" \
  ttl="1h"
Using * for service account names is permissive. For production, specify explicit service account names.

4.6 Install Vault Agent Injector

4.6.1 Copy CA chain to k3s node

From workstation, copy the DOMUS CA chain:

scp /etc/ssl/certs/DOMUS-CA-CHAIN.pem k3s-master-01:/tmp/

The Vault Agent Injector must trust Vault’s TLS certificate. Without the CA bundle, pods will fail with:

tls: failed to verify certificate: x509: certificate signed by unknown authority

4.6.2 Add Helm repo

On k3s-master-01:

helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update

4.6.3 Install injector with CA bundle

Table 7. Vault Helm Installation Parameters
Parameter Purpose

injector.enabled=true

Enable sidecar injector (required)

injector.externalVaultAddr

External Vault URL (not in-cluster)

server.enabled=false

Don’t deploy Vault server (using external)

injector.certs.caBundle

Base64-encoded CA chain for TLS verification (REQUIRED)

helm install vault hashicorp/vault \
  --set "injector.enabled=true" \
  --set "injector.externalVaultAddr=https://vault-01.inside.domusdigitalis.dev:8200" \
  --set "server.enabled=false" \
  --set "injector.certs.caBundle=$(base64 -w0 /tmp/DOMUS-CA-CHAIN.pem)"
caBundle provides the CA chain so the injector webhook trusts Vault’s TLS certificate. The base64 -w0 encodes without line breaks.

4.6.4 Create TLS secret for Vault Agent sidecars

The injector.certs.caBundle only affects the injector webhook, NOT the vault-agent sidecars injected into pods. Pods need a separate TLS secret.

kubectl create secret generic vault-tls --from-file=ca.crt=/tmp/DOMUS-CA-CHAIN.pem

Pods must reference this secret via annotations (see Phase 5.2).

Verify injector pod (jq):

kubectl get pods -l app.kubernetes.io/name=vault-agent-injector -o json | jq -r '
.items[] | "\(.metadata.name): \(.status.phase) (Ready: \(.status.containerStatuses[0].ready))"
'
Expected output
vault-agent-injector-xxxxx: Running (Ready: true)

Injector logs (filtered for errors):

kubectl logs -l app.kubernetes.io/name=vault-agent-injector --tail=50 | awk '/level=error|level=warn/ {print}'
Empty output = no errors.

Phase 5: Test Deployment

5.1 Create Test Secret in Vault

From workstation (with dsource d000 dev/vault loaded):

vault kv put kv/k3s/test username="testuser" password="testpass123"

Verify secret (jq structured output):

vault kv get -format=json kv/k3s/test | jq -r '
"Path: \(.request_id)",
"Version: \(.data.metadata.version)",
"Created: \(.data.metadata.created_time)",
"Keys: \(.data.data | keys | join(", "))"
'

Extract specific values:

vault kv get -format=json kv/k3s/test | jq -r '.data.data | to_entries[] | "\(.key)=\(.value)"'
Expected output
password=testpass123
username=testuser

5.2 Deploy Test Pod with Vault Injection

Table 8. Vault Injection Annotations
Annotation Purpose

vault.hashicorp.com/agent-inject

Enable sidecar injection

vault.hashicorp.com/role

Vault role for authentication

vault.hashicorp.com/agent-inject-secret-<file>

Secret path to inject into <file>

vault.hashicorp.com/tls-secret

Kubernetes secret containing CA cert

vault.hashicorp.com/ca-cert

Path to CA cert inside the secret volume

For external Vault with TLS, the tls-secret and ca-cert annotations are required. Without them, the vault-agent sidecar cannot verify Vault’s certificate.

kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: test-app
---
apiVersion: v1
kind: Pod
metadata:
  name: vault-test
  annotations:
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/role: "k3s-app"
    vault.hashicorp.com/agent-inject-secret-credentials.txt: "kv/data/k3s/test"
    vault.hashicorp.com/tls-secret: "vault-tls"
    vault.hashicorp.com/ca-cert: "/vault/tls/ca.crt"
spec:
  serviceAccountName: test-app
  containers:
  - name: app
    image: busybox
    command: ["sh", "-c", "while true; do sleep 3600; done"]
EOF
Changed from cat /vault/secrets/credentials.txt && sleep 3600 to infinite sleep loop. We’ll inspect manually.

5.3 Verify Secret Injection

Wait for pod with timeout:

kubectl wait --for=condition=Ready pod/vault-test --timeout=120s && echo "Pod ready" || echo "Timeout - check events"

Pod status deep dive (jq):

kubectl get pod vault-test -o json | jq -r '
"Phase: \(.status.phase)",
"Containers: \(.spec.containers | length) app + \(.spec.initContainers // [] | length) init",
"Vault Injected: \(.metadata.annotations["vault.hashicorp.com/agent-inject"] // "false")",
(.status.containerStatuses[]? | "  \(.name): \(.ready) (restarts: \(.restartCount))")
'

View injected secret (structured):

kubectl exec vault-test -- sh -c 'cat /vault/secrets/credentials.txt' | awk -F'=' '{printf "%-15s %s\n", $1":", $2}'
Expected output
username:       testuser
password:       testpass123

Vault agent init logs (filtered):

kubectl logs vault-test -c vault-agent-init 2>/dev/null | awk '
/level=info.*secret.*rendered/ {print "✓ Secret rendered"}
/level=error/ {print "✗ ERROR: "$0}
END {if(NR==0) print "No init container logs (may have completed)"}
'

Full injection verification:

kubectl exec vault-test -- sh -c '
echo "=== Secret File ===" && ls -la /vault/secrets/ && \
echo -e "\n=== Contents ===" && cat /vault/secrets/credentials.txt && \
echo -e "\n=== Permissions ===" && stat -c "%a %U:%G" /vault/secrets/credentials.txt
'

5.4 Cleanup Test Resources

kubectl delete pod vault-test
kubectl delete sa test-app

Phase 6: DNS Registration

6.1 Add DNS Record

Add A record via BIND nsupdate:

ssh bind-01 "sudo nsupdate -l << 'EOF'
zone inside.domusdigitalis.dev
update add k3s-master-01.inside.domusdigitalis.dev. 3600 A 10.50.1.120
send
EOF"

6.2 Verify DNS

dig k3s-master-01.inside.domusdigitalis.dev

Extract A record with awk:

dig +short k3s-master-01.inside.domusdigitalis.dev | awk 'NR==1'

Phase 7: Verification

7.1 Automated Health Check

Run all checks at once:

echo "=== k3s Cluster Health Check ===" && echo

# k3s service
printf "%-20s" "k3s service:"
systemctl is-active k3s 2>/dev/null | awk '{print ($1=="active") ? "✓ "$1 : "✗ "$1}'

# SELinux
printf "%-20s" "SELinux:"
getenforce | awk '{print ($1=="Enforcing") ? "✓ "$1 : "✗ "$1}'

# Firewall
printf "%-20s" "firewalld:"
systemctl is-active firewalld 2>/dev/null | awk '{print ($1=="active") ? "✓ "$1 : "✗ "$1}'

# Node status
printf "%-20s" "Node Ready:"
kubectl get nodes -o json 2>/dev/null | jq -r '.items[0].status.conditions[] | select(.type=="Ready") | if .status=="True" then "✓ Ready" else "✗ NotReady" end'

# Cilium
printf "%-20s" "Cilium:"
cilium status --output json 2>/dev/null | jq -r 'if .cilium.state=="Ok" then "✓ Ok" else "✗ \(.cilium.state)" end' || echo "✗ not installed"

# Hubble
printf "%-20s" "Hubble:"
cilium status --output json 2>/dev/null | jq -r 'if .hubble.state then "✓ \(.hubble.state)" else "○ disabled" end' || echo "○ N/A"

# Vault injector
printf "%-20s" "Vault Injector:"
kubectl get pods -l app.kubernetes.io/name=vault-agent-injector -o json 2>/dev/null | jq -r '.items[0].status.phase // "not found"' | awk '{print ($1=="Running") ? "✓ "$1 : "✗ "$1}'

# Pod summary
echo -e "\n=== Pod Summary ==="
kubectl get pods -A --no-headers 2>/dev/null | awk '
{status[$4]++; total++}
END {
  for(s in status) printf "%-15s %d\n", s":", status[s]
  print "---"
  printf "%-15s %d\n", "Total:", total
}'

# Unhealthy pods
echo -e "\n=== Unhealthy Pods ==="
kubectl get pods -A --no-headers 2>/dev/null | awk '$4!="Running" && $4!="Completed" {print $1"/"$2": "$4}' || echo "None"

7.2 Component Deep Dive

k3s service (structured):

systemctl show k3s --property=ActiveState,SubState,MainPID,MemoryCurrent | awk -F= '{
  if($1=="MemoryCurrent") printf "%-15s %.1f MB\n", $1":", $2/1024/1024
  else printf "%-15s %s\n", $1":", $2
}'

Journal errors (last 24h, deduplicated):

sudo journalctl -u k3s --since "24 hours ago" -p err --no-pager | awk '
!seen[$0]++ {print NR": "$0}
END {if(NR==0) print "No errors in last 24h"}
' | head -20

Cilium connectivity matrix:

cilium status --output json 2>/dev/null | jq -r '
"Cilium:     \(.cilium.state)",
"Operator:   \(.operator.state)",
"Hubble:     \(.hubble.state // "disabled")",
"ClusterMesh: \(.cluster_mesh.state // "disabled")"
'

7.3 Quick Status Commands with awk

Check all pods with status extraction:

kubectl get pods -A | awk 'NR==1 || /Running|Pending|Error/'

Count pods by status:

kubectl get pods -A --no-headers | awk '{count[$4]++} END {for(s in count) print s, count[s]}'
Example Output
Running 8
Completed 2

List non-Running pods only:

kubectl get pods -A | awk 'NR==1 || !/Running/'

awk pattern breakdown:

  • NR==1 - print header (line 1)

  • || - OR operator

  • /pattern/ - match pattern

  • !/pattern/ - NOT match pattern

Extract node resources:

kubectl top nodes 2>/dev/null | awk '{print $1, $3, $5}'

Count pods per node:

kubectl get pods -A -o wide --no-headers | awk '{count[$8]++} END {for(n in count) print n, count[n]}'

Troubleshooting

Migrating Existing k3s from Flannel to Cilium

If k3s was installed with default Flannel CNI and you need to migrate to Cilium:

This is a destructive operation. All pods will be recreated. Plan for downtime.

Option A (Recommended): Reinstall k3s fresh with Cilium Option B: In-place migration (complex, higher risk)

Option A: Clean Reinstall (Recommended for single-node)

# 1. Backup any persistent data
kubectl get pv -o yaml > /tmp/pv-backup.yaml
kubectl get pvc -A -o yaml > /tmp/pvc-backup.yaml

# 2. Uninstall k3s completely
/usr/local/bin/k3s-uninstall.sh

# 3. Reinstall with Cilium flags (Phase 3.1)
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_EXEC="server \
  --selinux \
  --flannel-backend=none \
  --disable-network-policy" sh -

# 4. Install Cilium (Phase 3.2)
# ... follow Phase 3.2 steps

Option B: In-Place Migration (Advanced)

# 1. Install Cilium alongside Flannel
cilium install --version 1.16.5

# 2. Wait for Cilium to be ready
cilium status --wait

# 3. Verify pods have Cilium networking
kubectl get pods -A -o wide

# 4. Delete Flannel (after Cilium verified working)
kubectl delete -n kube-system daemonset kube-flannel-ds 2>/dev/null || true

# 5. Remove Flannel CNI config
sudo rm -f /etc/cni/net.d/10-flannel.conflist

# 6. Restart k3s
sudo systemctl restart k3s
In-place migration may leave orphaned network resources. Clean reinstall is cleaner.

SELinux Denials

Check for AVC denials:

sudo ausearch -m avc -ts recent

Extract denied operations with awk:

sudo ausearch -m avc -ts recent | awk '/denied/{print}'

If k3s has SELinux issues:

sudo ausearch -c 'k3s' --raw | audit2allow -M k3s-selinux
sudo semodule -i k3s-selinux.pp
Only create SELinux policy modules for legitimate denials. Review audit2allow output before applying.

k3s Won’t Start

View journal entries with error filtering:

sudo journalctl -xeu k3s --no-pager -p err | awk 'NR <= 30'

View full context (last 100 lines):

sudo journalctl -xeu k3s --no-pager | awk 'NR > (NR-100)'

Vault Agent Injection Not Working

Check injector logs:

kubectl logs -l app.kubernetes.io/name=vault-agent-injector

Filter for errors with awk:

kubectl logs -l app.kubernetes.io/name=vault-agent-injector | awk '/error|Error|ERROR/'

Check pod events:

kubectl describe pod <pod-name> | awk '/Events:/,0'
awk '/pattern/,0' prints from pattern match to end of file.

Verify Vault connectivity from k3s:

kubectl run vault-test --rm -it --image=alpine -- wget --ca-certificate=/etc/ssl/certs/ca-certificates.crt -qO- https://vault-01.inside.domusdigitalis.dev:8200/v1/sys/health

Parse health check response:

kubectl run vault-test --rm -it --image=alpine -- wget --ca-certificate=/etc/ssl/certs/ca-certificates.crt -qO- https://vault-01.inside.domusdigitalis.dev:8200/v1/sys/health 2>/dev/null | jq -r '.sealed, .initialized'
If CA chain not in Alpine trust store, use --no-check-certificate for testing only.

SSH Breaks After Cilium Install/Upgrade

Symptom: SSH to VM times out after Cilium installation or upgrade, but ping works.

Root cause: Cilium 1.19.x has kubeProxyReplacement incompatibility with k3s. The eBPF programs block TCP traffic.

Quick fix (if locked out):

  1. Access VM via virsh console:

    sudo virsh console k3s-master-01
  2. Reboot to clear eBPF state:

    sudo reboot
  3. Immediately uninstall Cilium:

    helm uninstall cilium -n kube-system
  4. Reinstall with 1.16.5 LTS:

    helm install cilium cilium/cilium --version 1.16.5 \
      --namespace kube-system \
      --set kubeProxyReplacement=true \
      --set k8sServiceHost=127.0.0.1 \
      --set k8sServicePort=6443 \
      --set operator.replicas=1 \
      --set hubble.enabled=true \
      --set hubble.relay.enabled=false \
      --set hubble.ui.enabled=false

Prevention: Always use Cilium 1.16.x LTS with k3s. Do NOT use 1.19.x.

CiliumNode Stale IP After Static IP Fix

Root cause: If you install Cilium while the VM has a DHCP address, then fix to static IP later, Cilium caches the old DHCP IP in the CiliumNode resource.

Symptom: cilium connectivity test fails with ICMP health check timeouts. Cilium status shows wrong nodeIP.

Diagnose:

# Check what Cilium thinks the node IP is
kubectl get ciliumnode

# Compare to actual node IP
ip -4 addr show eth0 | awk '/inet/{print $2}'

If CiliumNode shows wrong IP (e.g., old DHCP address):

Fix:

# Delete stale CiliumNode (will recreate with correct IP)
kubectl delete ciliumnode $(hostname -f)

# Restart Cilium to pick up correct IP
kubectl rollout restart daemonset/cilium -n kube-system
kubectl rollout status daemonset/cilium -n kube-system

# Verify correct IP
kubectl get ciliumnode

Prevention: Always fix static IP (Phase 2.1.1) BEFORE installing k3s/Cilium.

Containerd Snapshotter Corruption

Symptom: Pod fails with exec format error but image shows correct architecture (linux/amd64). Binary inside container is empty (0 bytes).

Root cause: Containerd’s overlayfs snapshotter has corrupted layer metadata. Re-pulling images doesn’t fix it because snapshotter reuses cached (corrupt) filesystem.

Diagnose:

# Check binary inside image
sudo mkdir -p /mnt/test-image
sudo /usr/local/bin/k3s ctr images mount <image>@<digest> /mnt/test-image
file /mnt/test-image/usr/bin/<binary>
sudo umount /mnt/test-image

If binary shows empty:

Fix (Nuclear - re-pulls ALL images):

# Stop k3s
sudo systemctl stop k3s

# Remove corrupted snapshotter state
sudo rm -rf /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/*
sudo rm -f /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/metadata.db

# Start k3s (will re-pull all images fresh)
sudo systemctl start k3s

# Wait for cluster to stabilize
sleep 60
kubectl get pods -A
This re-pulls ALL container images. Plan for 5-10 minutes downtime.

k3s ctr Command Not Found

k3s bundles containerd with its own ctr command. It’s not in PATH for sudo.

Wrong:

sudo ctr images ls
# Error: sudo: ctr: command not found

Correct:

sudo /usr/local/bin/k3s ctr images ls

Common k3s ctr commands:

# List images
sudo /usr/local/bin/k3s ctr images ls

# Remove image
sudo /usr/local/bin/k3s ctr images rm <image>

# Pull image with specific platform
sudo /usr/local/bin/k3s ctr images pull --platform linux/amd64 <image>

# List containers
sudo /usr/local/bin/k3s ctr containers ls

# Check content store
sudo /usr/local/bin/k3s ctr content ls

Helm Upgrade Fails with nil pointer

Symptom:

Error: UPGRADE FAILED: template: cilium/templates/xxx.yaml:1:14: nil pointer evaluating interface \{}.enabled

Root cause: --reuse-values pulls old values missing new required fields in updated chart.

Fix: Export current values and upgrade with explicit values file:

# Export current values
helm get values cilium -n kube-system -o yaml > /tmp/cilium-values.yaml

# Review and add any missing required fields
cat /tmp/cilium-values.yaml

# Upgrade with explicit values
helm upgrade cilium cilium/cilium -n kube-system -f /tmp/cilium-values.yaml

Or use chart defaults with minimal overrides (see Phase 3.2.4 for production values file).

Reset k3s (Nuclear Option)

This destroys ALL k3s data including:

  • All deployed applications

  • All persistent volumes

  • All secrets and configmaps

  • Cluster state

Only use when recovery is impossible.

/usr/local/bin/k3s-uninstall.sh

Then reinstall from Phase 3.

Next Steps

After successful deployment:

  1. Expand to 6-node HA cluster - Deploy k3s-master-02/03, k3s-worker-01/02/03

  2. Configure Cilium BGP - Advertise LoadBalancer IPs to VyOS router

  3. Deploy monitoring - Prometheus + Grafana on k3s

  4. Configure GitOps - ArgoCD for declarative deployments

  5. Setup Traefik IngressRoute - HTTPS with Vault certificates

Appendix A: Deployment Chronicle

2026-02-22: Vault Agent Injector Installation

LAST DEPLOYED: Sun Feb 22 10:35:29 2026
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing HashiCorp Vault!

Now that you have deployed Vault, you should look over the docs on using
Vault with Kubernetes available here:

https://developer.hashicorp.com/vault/docs

Your release is named vault. To learn more about the release, try:

  $ helm status vault
  $ helm get manifest vault

Helm install command used:

helm install vault hashicorp/vault \
  --set "injector.enabled=true" \
  --set "injector.externalVaultAddr=https://vault-01.inside.domusdigitalis.dev:8200" \
  --set "server.enabled=false"

2026-02-22: Vault Agent TLS Fix

Problem: Vault Agent sidecars failed with x509: certificate signed by unknown authority.

Root cause: injector.certs.caBundle only affects the injector webhook, NOT the vault-agent sidecars.

Solution:

  1. Create TLS secret with CA cert:

    kubectl create secret generic vault-tls --from-file=ca.crt=/tmp/DOMUS-CA-CHAIN.pem
  2. Add annotations to pods:

    vault.hashicorp.com/tls-secret: "vault-tls"
    vault.hashicorp.com/ca-cert: "/vault/tls/ca.crt"

Verification:

kubectl exec vault-test -c app -- cat /vault/secrets/credentials.txt
data: map[password:testpass123 username:testuser]
metadata: map[created_time:2026-02-22T10:38:41.572336046Z ...]