Kubernetes Architecture Deep Dive

A deep technical reference on Kubernetes architecture, written by a network engineer transitioning from traditional infrastructure. This document bridges the gap between CCNP-level networking and cloud-native container orchestration.

Executive Summary

Kubernetes is an API-driven orchestration platform that treats infrastructure as code. Every component—pods, services, storage, networking—is a declarative object managed through a unified API.

Concept Traditional Infrastructure Equivalent

Pod

Process running on a server

Service

Load balancer VIP + DNS entry

Ingress

Reverse proxy / HAProxy

ConfigMap

Configuration file

Secret

Encrypted credential store

PersistentVolume

SAN LUN / NFS mount

Namespace

VLAN / VRF segmentation

NetworkPolicy

Firewall ACL

The API-Driven Model

Everything in Kubernetes is an API call. This is the fundamental concept that makes Kubernetes powerful.

Kubernetes API-Driven Architecture

How It Works

When you run kubectl apply -f deployment.yaml:

1. kubectl serializes YAML → JSON
2. kubectl sends HTTPS POST to kube-apiserver
3. API server validates against OpenAPI schema
4. API server authenticates (x509, token, OIDC)
5. API server authorizes (RBAC policies)
6. API server persists to etcd (distributed key-value store)
7. Controllers WATCH for changes
8. Controllers RECONCILE reality to match desired state

The Reconciliation Loop

This is the heart of Kubernetes—the control loop pattern:

Kubernetes Reconciliation Loop

Every controller follows this pattern:

  1. Observe - Watch API for changes

  2. Diff - Compare desired vs actual

  3. Act - Make changes to converge

Control Plane Components

Kubernetes Control Plane Architecture

kube-apiserver

The central hub of Kubernetes. All communication flows through it.

Function Description Network Equivalent

Authentication

Verify identity (x509, tokens, OIDC)

RADIUS/TACACS+

Authorization

Check permissions (RBAC)

ACLs / privilege levels

Admission Control

Validate/mutate requests

Firewall inspection

Persistence

Store in etcd

Configuration database

Watch

Push changes to controllers

Syslog / SNMP traps

etcd

Distributed key-value store. The single source of truth.

etcd cluster (Raft consensus)
├── /registry/pods/default/nginx-abc123
├── /registry/services/monitoring/prometheus
├── /registry/secrets/vault-system/vault-tls
└── /registry/configmaps/argocd/argocd-cm

Network analogy: This is like the NVRAM/startup-config that persists across reboots, but distributed and versioned.

Controllers

Specialized reconciliation loops for each resource type:

Controller Watches Actions

Deployment Controller

Deployments

Creates/updates ReplicaSets

ReplicaSet Controller

ReplicaSets

Creates/deletes Pods

Service Controller

Services type=LoadBalancer

Provisions cloud LBs

Endpoint Controller

Services + Pods

Updates endpoint lists

Node Controller

Nodes

Marks unhealthy nodes

Scheduler

Decides where pods run. Considers:

  • Resource requests (CPU, memory)

  • Node taints/tolerations

  • Affinity/anti-affinity rules

  • Pod topology spread

Network analogy: Like EIGRP/OSPF choosing the best path, but for workload placement.

Node Components

Kubernetes Node Architecture

kubelet

The agent running on each node. Responsibilities:

  1. Register node with API server

  2. Watch for pod assignments

  3. Call container runtime (CRI)

  4. Report pod status back

  5. Execute liveness/readiness probes

Container Runtime (containerd)

Actually runs containers. The stack:

Container Runtime Stack

Key insight: Docker is NOT required. containerd speaks the same image format (OCI) but without Docker’s overhead.

kube-proxy vs Cilium

Traditional kube-proxy uses iptables. Cilium replaces this with eBPF.

Feature kube-proxy (iptables) Cilium (eBPF)

Implementation

iptables rules

eBPF programs in kernel

Performance

O(n) rule matching

O(1) hash lookups

Observability

Limited

Hubble (full flow visibility)

Network Policy

Basic

L3-L7 with identity

Service Mesh

Requires Istio/Linkerd

Built-in (optional)

Pod Lifecycle

Pod Lifecycle State Machine

States

State Description Common Causes

Pending

Accepted but not scheduled

No node capacity, image pull

Running

At least one container running

Normal operation

Succeeded

All containers exited 0

Jobs, init containers

Failed

Containers exited non-zero

Application crash

Unknown

Node communication lost

Network partition

Pod Phases in Detail

The pod lifecycle diagram above shows the complete state machine. Key transitions:

  • PENDING → Init - Pod scheduled to a node, init containers starting

  • Init → Initializing - Init containers complete, main containers starting

  • Initializing → RUNNING - Containers ready, probes passing

  • RUNNING → SUCCEEDED/FAILED - Containers exit (exit code determines state)

Networking Model

Kubernetes Networking Model

The Four Requirements

Kubernetes networking must satisfy:

  1. Pod-to-Pod: All pods can communicate without NAT

  2. Pod-to-Service: Services provide stable endpoints

  3. External-to-Service: Ingress exposes services

  4. Pod-to-External: Pods can reach internet

Network Identity

Every pod gets:

  • Unique IP (from CNI plugin)

  • DNS name (pod-ip.namespace.pod.cluster.local)

  • Namespace isolation (optional, via NetworkPolicy)

Service Types

Type Scope Use Case Network Equivalent

ClusterIP

Internal only

Inter-service communication

Private VLAN

NodePort

Node IP:Port

Development, testing

Static NAT

LoadBalancer

External IP

Production external access

VIP with health checks

ExternalName

DNS CNAME

External service reference

DNS alias

Service Discovery

Pod wants to reach "prometheus" service:

1. App calls: http://prometheus:9090
2. CoreDNS resolves: prometheus.monitoring.svc.cluster.local
3. Returns: ClusterIP (10.43.x.x)
4. Cilium/kube-proxy routes to healthy pod endpoint
5. Traffic arrives at prometheus pod

Storage Architecture

Kubernetes Storage Architecture

The CSI Model

Container Storage Interface (CSI) abstracts storage providers:

PersistentVolumeClaim (PVC)     ← User request
        │
        ▼
StorageClass                    ← Provisioner config
        │
        ▼
CSI Driver                      ← Provider plugin
        │
        ▼
PersistentVolume (PV)           ← Actual storage
        │
        ▼
Backend (NFS, iSCSI, etc.)      ← Physical storage

Storage Classes

Class Backend Use Case

nfs-prometheus

NAS-01 NFS

Prometheus TSDB, Grafana

nfs-wazuh

NAS-01 NFS

Wazuh indexer data

local-path

Node local disk

Ephemeral, high-performance

longhorn

Distributed block

HA storage (requires multiple nodes)

Security Model

Defense in Depth

Kubernetes Defense in Depth

RBAC Model

# Role: What actions on what resources
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: monitoring
  name: prometheus-reader
rules:
- apiGroups: [""]
  resources: ["pods", "services", "endpoints"]
  verbs: ["get", "list", "watch"]

---
# RoleBinding: Who gets the role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: monitoring
  name: prometheus-reader-binding
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring
roleRef:
  kind: Role
  name: prometheus-reader
  apiGroup: rbac.authorization.k8s.io

k3s: Lightweight Kubernetes

Kubernetes vs k3s Architecture Comparison

What k3s Simplifies

Component Full Kubernetes k3s

Binary

Multiple (apiserver, scheduler, etc.)

Single binary (~60MB)

etcd

External cluster

Embedded SQLite or etcd

Container Runtime

Docker/containerd

containerd (built-in)

Networking

Manual CNI setup

Flannel included (or disable for Cilium)

Storage

Manual CSI setup

Local-path provisioner included

Load Balancer

Cloud provider or MetalLB

ServiceLB included

k3s Architecture

The k8s vs k3s comparison diagram shows the architectural differences:

Kubernetes vs k3s Architecture

Workload Patterns

Current Domus Digitalis Workloads

Workload Type Purpose Status

Prometheus

StatefulSet

Metrics collection

Planned

Grafana

Deployment

Visualization

Planned

ArgoCD

Deployment

GitOps CD

Planned

Traefik

DaemonSet/Deployment

Ingress

Planned

Wazuh

StatefulSet

SIEM/XDR

Planned

MinIO

StatefulSet

S3 storage

Planned

Additional Workloads to Consider

Workload Purpose Why Consider Complexity

Loki

Log aggregation

Complements Prometheus (metrics + logs)

Medium

Tempo

Distributed tracing

Complete observability stack

Medium

Cert-Manager

Certificate automation

Auto-renew certs from Vault PKI

Low

External-DNS

DNS automation

Auto-create DNS entries for ingress

Low

Velero

Backup/restore

Disaster recovery for k8s resources

Medium

Kyverno

Policy engine

Enforce security policies

Medium

Falco

Runtime security

Detect anomalous behavior

Medium

Harbor

Container registry

Private OCI registry with scanning

High

Keycloak

Identity provider

SSO for all services (could move from VM)

High

Gitea

Git server

Could move from NAS to k8s

Medium

Vault

Secrets management

Could run Vault IN k8s (HA easier)

High

Teleport

Access management

SSH/k8s/DB access gateway

High

Backstage

Developer portal

Service catalog, documentation

High

The Observability Stack

Kubernetes Observability Stack

Mental Models

For Network Engineers

Network Concept Kubernetes Equivalent

VLAN

Namespace (logical isolation)

VRF

NetworkPolicy (routing isolation)

ACL

NetworkPolicy (allow/deny rules)

HSRP/VRRP

Service (stable VIP)

Load Balancer

Service type=LoadBalancer + Ingress

DNS

CoreDNS + Service discovery

SNMP/Syslog

Prometheus metrics + Loki logs

RADIUS

ServiceAccount + RBAC

802.1X

Pod Security Admission

Spanning Tree

Pod anti-affinity (avoid single points)

BGP Peering

Cilium BGP (advertise service IPs)

For Systems Administrators

Sysadmin Concept Kubernetes Equivalent

VM

Pod (but ephemeral)

systemd service

Deployment/StatefulSet

/etc/config

ConfigMap

/etc/secrets

Secret (or Vault)

cron job

CronJob

init scripts

Init containers

health check

Liveness/Readiness probes

log files

stdout/stderr → log aggregator

disk mount

PersistentVolumeClaim

firewall rules

NetworkPolicy

Command Reference

Essential kubectl Commands

# Cluster info
kubectl cluster-info
kubectl get nodes -o wide

# Workloads
kubectl get pods -A                    # All namespaces
kubectl get pods -n monitoring -o wide # Specific namespace
kubectl describe pod <name>            # Detailed info
kubectl logs <pod> -f                  # Follow logs
kubectl logs <pod> -c <container>      # Specific container
kubectl exec -it <pod> -- /bin/sh      # Shell access

# Resources
kubectl top nodes                      # Node resource usage
kubectl top pods -A                    # Pod resource usage

# Debugging
kubectl get events --sort-by='.lastTimestamp'
kubectl describe node <node> | grep -A5 Conditions
kubectl auth can-i create pods --as=system:serviceaccount:default:mysa

# Advanced
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
kubectl get pods -o custom-columns='NAME:.metadata.name,STATUS:.status.phase,IP:.status.podIP'

Cilium Commands

# Status
cilium status
cilium connectivity test

# Network flows (Hubble)
hubble observe --namespace monitoring
hubble observe --protocol TCP --port 9090

# Policy
cilium policy get
cilium endpoint list

Helm Commands

# Repository management
helm repo add <name> <url>
helm repo update
helm search repo <keyword>

# Installation
helm install <release> <chart> -n <namespace> -f values.yaml
helm upgrade <release> <chart> -n <namespace> -f values.yaml
helm rollback <release> <revision>

# Inspection
helm list -A
helm get values <release> -n <namespace>
helm get manifest <release> -n <namespace>

Troubleshooting Framework

The Debugging Ladder

Level 1: Is the pod running?
├── kubectl get pods -n <namespace>
├── kubectl describe pod <pod>
└── kubectl logs <pod>

Level 2: Is the service routing?
├── kubectl get svc,endpoints -n <namespace>
├── kubectl exec <pod> -- curl <service>:<port>
└── hubble observe --namespace <namespace>

Level 3: Is networking working?
├── cilium connectivity test
├── kubectl exec <pod> -- nslookup <service>
└── kubectl exec <pod> -- ping <ip>

Level 4: Is storage attached?
├── kubectl get pv,pvc
├── kubectl describe pvc <pvc>
└── kubectl exec <pod> -- df -h

Level 5: Is the node healthy?
├── kubectl describe node <node>
├── kubectl top node
└── ssh <node> "journalctl -u k3s"

Common Issues and Fixes

Symptom Likely Cause Fix

Pod stuck Pending

No resources / no PV

kubectl describe pod → check events

Pod CrashLoopBackOff

App crashing

kubectl logs → fix application

Service not resolving

CoreDNS issue

kubectl logs -n kube-system -l k8s-app=kube-dns

ImagePullBackOff

Registry auth / image not found

Check image name, registry credentials

Vault injection failed

TLS or auth issue

Check vault-agent logs in pod

NetworkPolicy blocking

Missing allow rule

hubble observe to see drops

Architecture Decision Records

ADR-001: k3s over Full Kubernetes

Decision: Use k3s instead of kubeadm/RKE/EKS.

Rationale: - Single-node deployment (no HA requirement yet) - Reduced resource overhead (~512MB vs 2GB+) - Simpler operations (single binary) - Still 100% Kubernetes compatible

Trade-offs: - Less flexibility in component versions - Some enterprise features require additional setup

ADR-002: Cilium over Flannel

Decision: Replace default Flannel with Cilium.

Rationale: - eBPF performance (O(1) vs iptables O(n)) - Hubble observability (network flow visibility) - L7 network policies (HTTP-aware rules) - Native integration with Prometheus

Trade-offs: - More complex initial setup - Higher memory usage (~200MB)

ADR-003: Vault External over In-Cluster

Decision: Keep Vault as external VM, not in k8s.

Rationale: - Vault manages secrets for k8s (chicken-egg problem) - HA easier with dedicated VMs - Existing investment in vault-01 infrastructure

Trade-offs: - External dependency for k8s workloads - Network latency for secret retrieval

Glossary

Term Definition

CNI

Container Network Interface - plugin API for pod networking

CRI

Container Runtime Interface - plugin API for container execution

CSI

Container Storage Interface - plugin API for storage

CRD

Custom Resource Definition - extend k8s API with custom types

DaemonSet

Run exactly one pod per node (like agents)

Deployment

Stateless workload with rolling updates

eBPF

Extended Berkeley Packet Filter - kernel-level programmability

Helm

Package manager for Kubernetes (charts = packages)

Ingress

L7 routing (HTTP host/path to service)

Kubelet

Node agent that runs pods

Namespace

Logical isolation boundary (like VLAN)

Operator

Custom controller managing complex applications

Pod

Smallest deployable unit (one or more containers)

RBAC

Role-Based Access Control

Service

Stable network endpoint for pods (like VIP)

StatefulSet

Stateful workload with stable identity and storage