Kubernetes Architecture Deep Dive

A deep technical reference on Kubernetes architecture, written by a network engineer transitioning from traditional infrastructure. This document bridges the gap between CCNP-level networking and cloud-native container orchestration.

Executive Summary

Kubernetes is an API-driven orchestration platform that treats infrastructure as code. Every component—pods, services, storage, networking—is a declarative object managed through a unified API.

Concept	Traditional Infrastructure Equivalent
Pod	Process running on a server
Service	Load balancer VIP + DNS entry
Ingress	Reverse proxy / HAProxy
ConfigMap	Configuration file
Secret	Encrypted credential store
PersistentVolume	SAN LUN / NFS mount
Namespace	VLAN / VRF segmentation
NetworkPolicy	Firewall ACL

Concept

Traditional Infrastructure Equivalent

Pod

Process running on a server

Service

Load balancer VIP + DNS entry

Ingress

Reverse proxy / HAProxy

ConfigMap

Configuration file

Secret

Encrypted credential store

PersistentVolume

SAN LUN / NFS mount

Namespace

VLAN / VRF segmentation

NetworkPolicy

Firewall ACL

The API-Driven Model

Everything in Kubernetes is an API call. This is the fundamental concept that makes Kubernetes powerful.

How It Works

When you run kubectl apply -f deployment.yaml:

1. kubectl serializes YAML → JSON
2. kubectl sends HTTPS POST to kube-apiserver
3. API server validates against OpenAPI schema
4. API server authenticates (x509, token, OIDC)
5. API server authorizes (RBAC policies)
6. API server persists to etcd (distributed key-value store)
7. Controllers WATCH for changes
8. Controllers RECONCILE reality to match desired state

The Reconciliation Loop

This is the heart of Kubernetes—the control loop pattern:

Every controller follows this pattern:

Observe - Watch API for changes
Diff - Compare desired vs actual
Act - Make changes to converge

Control Plane Components

kube-apiserver

The central hub of Kubernetes. All communication flows through it.

Function	Description	Network Equivalent
Authentication	Verify identity (x509, tokens, OIDC)	RADIUS/TACACS+
Authorization	Check permissions (RBAC)	ACLs / privilege levels
Admission Control	Validate/mutate requests	Firewall inspection
Persistence	Store in etcd	Configuration database
Watch	Push changes to controllers	Syslog / SNMP traps

Function

Description

Network Equivalent

Authentication

Verify identity (x509, tokens, OIDC)

RADIUS/TACACS+

Authorization

Check permissions (RBAC)

ACLs / privilege levels

Admission Control

Validate/mutate requests

Firewall inspection

Persistence

Store in etcd

Configuration database

Watch

Push changes to controllers

Syslog / SNMP traps

etcd

Distributed key-value store. The single source of truth.

etcd cluster (Raft consensus)
├── /registry/pods/default/nginx-abc123
├── /registry/services/monitoring/prometheus
├── /registry/secrets/vault-system/vault-tls
└── /registry/configmaps/argocd/argocd-cm

Network analogy: This is like the NVRAM/startup-config that persists across reboots, but distributed and versioned.

Controllers

Specialized reconciliation loops for each resource type:

Controller	Watches	Actions
Deployment Controller	Deployments	Creates/updates ReplicaSets
ReplicaSet Controller	ReplicaSets	Creates/deletes Pods
Service Controller	Services type=LoadBalancer	Provisions cloud LBs
Endpoint Controller	Services + Pods	Updates endpoint lists
Node Controller	Nodes	Marks unhealthy nodes

Controller

Watches

Actions

Deployment Controller

Deployments

Creates/updates ReplicaSets

ReplicaSet Controller

ReplicaSets

Creates/deletes Pods

Service Controller

Services type=LoadBalancer

Provisions cloud LBs

Endpoint Controller

Services + Pods

Updates endpoint lists

Node Controller

Nodes

Marks unhealthy nodes

Scheduler

Decides where pods run. Considers:

Resource requests (CPU, memory)
Node taints/tolerations
Affinity/anti-affinity rules
Pod topology spread

Network analogy: Like EIGRP/OSPF choosing the best path, but for workload placement.

Node Components

kubelet

The agent running on each node. Responsibilities:

Register node with API server
Watch for pod assignments
Call container runtime (CRI)
Report pod status back
Execute liveness/readiness probes

Container Runtime (containerd)

Actually runs containers. The stack:

Key insight: Docker is NOT required. containerd speaks the same image format (OCI) but without Docker’s overhead.

kube-proxy vs Cilium

Traditional kube-proxy uses iptables. Cilium replaces this with eBPF.

Feature	kube-proxy (iptables)	Cilium (eBPF)
Implementation	iptables rules	eBPF programs in kernel
Performance	O(n) rule matching	O(1) hash lookups
Observability	Limited	Hubble (full flow visibility)
Network Policy	Basic	L3-L7 with identity
Service Mesh	Requires Istio/Linkerd	Built-in (optional)

Feature

kube-proxy (iptables)

Cilium (eBPF)

Implementation

iptables rules

eBPF programs in kernel

Performance

O(n) rule matching

O(1) hash lookups

Observability

Limited

Hubble (full flow visibility)

Network Policy

Basic

L3-L7 with identity

Service Mesh

Requires Istio/Linkerd

Built-in (optional)

Pod Lifecycle

States

State	Description	Common Causes
Pending	Accepted but not scheduled	No node capacity, image pull
Running	At least one container running	Normal operation
Succeeded	All containers exited 0	Jobs, init containers
Failed	Containers exited non-zero	Application crash
Unknown	Node communication lost	Network partition

State

Description

Common Causes

Pending

Accepted but not scheduled

No node capacity, image pull

Running

At least one container running

Normal operation

Succeeded

All containers exited 0

Jobs, init containers

Failed

Containers exited non-zero

Application crash

Unknown

Node communication lost

Network partition

Pod Phases in Detail

The pod lifecycle diagram above shows the complete state machine. Key transitions:

PENDING → Init - Pod scheduled to a node, init containers starting
Init → Initializing - Init containers complete, main containers starting
Initializing → RUNNING - Containers ready, probes passing
RUNNING → SUCCEEDED/FAILED - Containers exit (exit code determines state)

Networking Model

The Four Requirements

Kubernetes networking must satisfy:

Pod-to-Pod: All pods can communicate without NAT
Pod-to-Service: Services provide stable endpoints
External-to-Service: Ingress exposes services
Pod-to-External: Pods can reach internet

Network Identity

Every pod gets:

Unique IP (from CNI plugin)
DNS name (pod-ip.namespace.pod.cluster.local)
Namespace isolation (optional, via NetworkPolicy)

Service Types

Type	Scope	Use Case	Network Equivalent
ClusterIP	Internal only	Inter-service communication	Private VLAN
NodePort	Node IP:Port	Development, testing	Static NAT
LoadBalancer	External IP	Production external access	VIP with health checks
ExternalName	DNS CNAME	External service reference	DNS alias

Type

Scope

Use Case

Network Equivalent

ClusterIP

Internal only

Inter-service communication

Private VLAN

NodePort

Node IP:Port

Development, testing

Static NAT

LoadBalancer

External IP

Production external access

VIP with health checks

ExternalName

DNS CNAME

External service reference

DNS alias

Service Discovery

Pod wants to reach "prometheus" service:

1. App calls: http://prometheus:9090
2. CoreDNS resolves: prometheus.monitoring.svc.cluster.local
3. Returns: ClusterIP (10.43.x.x)
4. Cilium/kube-proxy routes to healthy pod endpoint
5. Traffic arrives at prometheus pod

Storage Architecture

The CSI Model

Container Storage Interface (CSI) abstracts storage providers:

PersistentVolumeClaim (PVC)     ← User request
        │
        ▼
StorageClass                    ← Provisioner config
        │
        ▼
CSI Driver                      ← Provider plugin
        │
        ▼
PersistentVolume (PV)           ← Actual storage
        │
        ▼
Backend (NFS, iSCSI, etc.)      ← Physical storage

Storage Classes

Class	Backend	Use Case
nfs-prometheus	NAS-01 NFS	Prometheus TSDB, Grafana
nfs-wazuh	NAS-01 NFS	Wazuh indexer data
local-path	Node local disk	Ephemeral, high-performance
longhorn	Distributed block	HA storage (requires multiple nodes)

Class

Backend

Use Case

nfs-prometheus

NAS-01 NFS

Prometheus TSDB, Grafana

nfs-wazuh

NAS-01 NFS

Wazuh indexer data

local-path

Node local disk

Ephemeral, high-performance

longhorn

Distributed block

HA storage (requires multiple nodes)

Security Model

Defense in Depth

RBAC Model

# Role: What actions on what resources
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: monitoring
  name: prometheus-reader
rules:
- apiGroups: [""]
  resources: ["pods", "services", "endpoints"]
  verbs: ["get", "list", "watch"]

---
# RoleBinding: Who gets the role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  namespace: monitoring
  name: prometheus-reader-binding
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring
roleRef:
  kind: Role
  name: prometheus-reader
  apiGroup: rbac.authorization.k8s.io

k3s: Lightweight Kubernetes

Kubernetes vs k3s Architecture Comparison

What k3s Simplifies

Component	Full Kubernetes	k3s
Binary	Multiple (apiserver, scheduler, etc.)	Single binary (~60MB)
etcd	External cluster	Embedded SQLite or etcd
Container Runtime	Docker/containerd	containerd (built-in)
Networking	Manual CNI setup	Flannel included (or disable for Cilium)
Storage	Manual CSI setup	Local-path provisioner included
Load Balancer	Cloud provider or MetalLB	ServiceLB included

Component

Full Kubernetes

k3s

Binary

Multiple (apiserver, scheduler, etc.)

Single binary (~60MB)

etcd

External cluster

Embedded SQLite or etcd

Container Runtime

Docker/containerd

containerd (built-in)

Networking

Manual CNI setup

Flannel included (or disable for Cilium)

Storage

Manual CSI setup

Local-path provisioner included

Load Balancer

Cloud provider or MetalLB

ServiceLB included

k3s Architecture

The k8s vs k3s comparison diagram shows the architectural differences:

Workload Patterns

Current Domus Digitalis Workloads

Workload	Type	Purpose	Status
Prometheus	StatefulSet	Metrics collection	Planned
Grafana	Deployment	Visualization	Planned
ArgoCD	Deployment	GitOps CD	Planned
Traefik	DaemonSet/Deployment	Ingress	Planned
Wazuh	StatefulSet	SIEM/XDR	Planned
MinIO	StatefulSet	S3 storage	Planned

Workload

Type

Purpose

Status

Prometheus

StatefulSet

Metrics collection

Planned

Grafana

Deployment

Visualization

Planned

ArgoCD

Deployment

GitOps CD

Planned

Traefik

DaemonSet/Deployment

Ingress

Planned

Wazuh

StatefulSet

SIEM/XDR

Planned

MinIO

StatefulSet

S3 storage

Planned

Additional Workloads to Consider

Workload	Purpose	Why Consider	Complexity
Loki	Log aggregation	Complements Prometheus (metrics + logs)	Medium
Tempo	Distributed tracing	Complete observability stack	Medium
Cert-Manager	Certificate automation	Auto-renew certs from Vault PKI	Low
External-DNS	DNS automation	Auto-create DNS entries for ingress	Low
Velero	Backup/restore	Disaster recovery for k8s resources	Medium
Kyverno	Policy engine	Enforce security policies	Medium
Falco	Runtime security	Detect anomalous behavior	Medium
Harbor	Container registry	Private OCI registry with scanning	High
Keycloak	Identity provider	SSO for all services (could move from VM)	High
Gitea	Git server	Could move from NAS to k8s	Medium
Vault	Secrets management	Could run Vault IN k8s (HA easier)	High
Teleport	Access management	SSH/k8s/DB access gateway	High
Backstage	Developer portal	Service catalog, documentation	High

Workload

Purpose

Why Consider

Complexity

Loki

Log aggregation

Complements Prometheus (metrics + logs)

Medium

Tempo

Distributed tracing

Complete observability stack

Medium

Cert-Manager

Certificate automation

Auto-renew certs from Vault PKI

Low

External-DNS

DNS automation

Auto-create DNS entries for ingress

Low

Velero

Backup/restore

Disaster recovery for k8s resources

Medium

Kyverno

Policy engine

Enforce security policies

Medium

Falco

Runtime security

Detect anomalous behavior

Medium

Harbor

Container registry

Private OCI registry with scanning

High

Keycloak

Identity provider

SSO for all services (could move from VM)

High

Gitea

Git server

Could move from NAS to k8s

Medium

Vault

Secrets management

Could run Vault IN k8s (HA easier)

High

Teleport

Access management

SSH/k8s/DB access gateway

High

Backstage

Developer portal

Service catalog, documentation

High

The Observability Stack

Mental Models

For Network Engineers

Network Concept	Kubernetes Equivalent
VLAN	Namespace (logical isolation)
VRF	NetworkPolicy (routing isolation)
ACL	NetworkPolicy (allow/deny rules)
HSRP/VRRP	Service (stable VIP)
Load Balancer	Service type=LoadBalancer + Ingress
DNS	CoreDNS + Service discovery
SNMP/Syslog	Prometheus metrics + Loki logs
RADIUS	ServiceAccount + RBAC
802.1X	Pod Security Admission
Spanning Tree	Pod anti-affinity (avoid single points)
BGP Peering	Cilium BGP (advertise service IPs)

Network Concept

Kubernetes Equivalent

VLAN

Namespace (logical isolation)

VRF

NetworkPolicy (routing isolation)

ACL

NetworkPolicy (allow/deny rules)

HSRP/VRRP

Service (stable VIP)

Load Balancer

Service type=LoadBalancer + Ingress

DNS

CoreDNS + Service discovery

SNMP/Syslog

Prometheus metrics + Loki logs

RADIUS

ServiceAccount + RBAC

802.1X

Pod Security Admission

Spanning Tree

Pod anti-affinity (avoid single points)

BGP Peering

Cilium BGP (advertise service IPs)

For Systems Administrators

Sysadmin Concept	Kubernetes Equivalent
VM	Pod (but ephemeral)
systemd service	Deployment/StatefulSet
/etc/config	ConfigMap
/etc/secrets	Secret (or Vault)
cron job	CronJob
init scripts	Init containers
health check	Liveness/Readiness probes
log files	stdout/stderr → log aggregator
disk mount	PersistentVolumeClaim
firewall rules	NetworkPolicy

Sysadmin Concept

Kubernetes Equivalent

Pod (but ephemeral)

systemd service

Deployment/StatefulSet

/etc/config

ConfigMap

/etc/secrets

Secret (or Vault)

cron job

CronJob

init scripts

Init containers

health check

Liveness/Readiness probes

log files

stdout/stderr → log aggregator

disk mount

PersistentVolumeClaim

firewall rules

NetworkPolicy

Command Reference

Essential kubectl Commands

# Cluster info
kubectl cluster-info
kubectl get nodes -o wide

# Workloads
kubectl get pods -A                    # All namespaces
kubectl get pods -n monitoring -o wide # Specific namespace
kubectl describe pod <name>            # Detailed info
kubectl logs <pod> -f                  # Follow logs
kubectl logs <pod> -c <container>      # Specific container
kubectl exec -it <pod> -- /bin/sh      # Shell access

# Resources
kubectl top nodes                      # Node resource usage
kubectl top pods -A                    # Pod resource usage

# Debugging
kubectl get events --sort-by='.lastTimestamp'
kubectl describe node <node> | grep -A5 Conditions
kubectl auth can-i create pods --as=system:serviceaccount:default:mysa

# Advanced
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
kubectl get pods -o custom-columns='NAME:.metadata.name,STATUS:.status.phase,IP:.status.podIP'

Cilium Commands

# Status
cilium status
cilium connectivity test

# Network flows (Hubble)
hubble observe --namespace monitoring
hubble observe --protocol TCP --port 9090

# Policy
cilium policy get
cilium endpoint list

Helm Commands

# Repository management
helm repo add <name> <url>
helm repo update
helm search repo <keyword>

# Installation
helm install <release> <chart> -n <namespace> -f values.yaml
helm upgrade <release> <chart> -n <namespace> -f values.yaml
helm rollback <release> <revision>

# Inspection
helm list -A
helm get values <release> -n <namespace>
helm get manifest <release> -n <namespace>

Troubleshooting Framework

The Debugging Ladder

Level 1: Is the pod running?
├── kubectl get pods -n <namespace>
├── kubectl describe pod <pod>
└── kubectl logs <pod>

Level 2: Is the service routing?
├── kubectl get svc,endpoints -n <namespace>
├── kubectl exec <pod> -- curl <service>:<port>
└── hubble observe --namespace <namespace>

Level 3: Is networking working?
├── cilium connectivity test
├── kubectl exec <pod> -- nslookup <service>
└── kubectl exec <pod> -- ping <ip>

Level 4: Is storage attached?
├── kubectl get pv,pvc
├── kubectl describe pvc <pvc>
└── kubectl exec <pod> -- df -h

Level 5: Is the node healthy?
├── kubectl describe node <node>
├── kubectl top node
└── ssh <node> "journalctl -u k3s"

Common Issues and Fixes

Symptom Likely Cause Fix

Symptom	Likely Cause	Fix
Pod stuck Pending	No resources / no PV	`kubectl describe pod` → check events
Pod CrashLoopBackOff	App crashing	`kubectl logs` → fix application
Service not resolving	CoreDNS issue	`kubectl logs -n kube-system -l k8s-app=kube-dns`
ImagePullBackOff	Registry auth / image not found	Check image name, registry credentials
Vault injection failed	TLS or auth issue	Check vault-agent logs in pod
NetworkPolicy blocking	Missing allow rule	`hubble observe` to see drops

Pod stuck Pending

No resources / no PV

kubectl describe pod → check events

Pod CrashLoopBackOff

App crashing

kubectl logs → fix application

Service not resolving

CoreDNS issue

kubectl logs -n kube-system -l k8s-app=kube-dns

ImagePullBackOff

Registry auth / image not found

Check image name, registry credentials

Vault injection failed

TLS or auth issue

Check vault-agent logs in pod

NetworkPolicy blocking

Missing allow rule

hubble observe to see drops

Architecture Decision Records

ADR-001: k3s over Full Kubernetes

Decision: Use k3s instead of kubeadm/RKE/EKS.

Rationale: - Single-node deployment (no HA requirement yet) - Reduced resource overhead (~512MB vs 2GB+) - Simpler operations (single binary) - Still 100% Kubernetes compatible

Trade-offs: - Less flexibility in component versions - Some enterprise features require additional setup

ADR-002: Cilium over Flannel

Decision: Replace default Flannel with Cilium.

Rationale: - eBPF performance (O(1) vs iptables O(n)) - Hubble observability (network flow visibility) - L7 network policies (HTTP-aware rules) - Native integration with Prometheus

Trade-offs: - More complex initial setup - Higher memory usage (~200MB)

ADR-003: Vault External over In-Cluster

Decision: Keep Vault as external VM, not in k8s.

Rationale: - Vault manages secrets for k8s (chicken-egg problem) - HA easier with dedicated VMs - Existing investment in vault-01 infrastructure

Trade-offs: - External dependency for k8s workloads - Network latency for secret retrieval

Glossary

Term	Definition
CNI	Container Network Interface - plugin API for pod networking
CRI	Container Runtime Interface - plugin API for container execution
CSI	Container Storage Interface - plugin API for storage
CRD	Custom Resource Definition - extend k8s API with custom types
DaemonSet	Run exactly one pod per node (like agents)
Deployment	Stateless workload with rolling updates
eBPF	Extended Berkeley Packet Filter - kernel-level programmability
Helm	Package manager for Kubernetes (charts = packages)
Ingress	L7 routing (HTTP host/path to service)
Kubelet	Node agent that runs pods
Namespace	Logical isolation boundary (like VLAN)
Operator	Custom controller managing complex applications
Pod	Smallest deployable unit (one or more containers)
RBAC	Role-Based Access Control
Service	Stable network endpoint for pods (like VIP)
StatefulSet	Stateful workload with stable identity and storage

Term

Definition

CNI

Container Network Interface - plugin API for pod networking

CRI

Container Runtime Interface - plugin API for container execution

CSI

Container Storage Interface - plugin API for storage

CRD

Custom Resource Definition - extend k8s API with custom types

DaemonSet

Run exactly one pod per node (like agents)

Deployment

Stateless workload with rolling updates

eBPF

Extended Berkeley Packet Filter - kernel-level programmability

Helm

Package manager for Kubernetes (charts = packages)

Ingress

L7 routing (HTTP host/path to service)

Kubelet

Node agent that runs pods

Namespace

Logical isolation boundary (like VLAN)

Operator

Custom controller managing complex applications

Pod

Smallest deployable unit (one or more containers)

RBAC

Role-Based Access Control

Service

Stable network endpoint for pods (like VIP)

StatefulSet

Stateful workload with stable identity and storage

Kubernetes Architecture Deep Dive

Executive Summary

The API-Driven Model

How It Works

The Reconciliation Loop

Control Plane Components

kube-apiserver

etcd

Controllers

Scheduler

Node Components

kubelet

Container Runtime (containerd)

kube-proxy vs Cilium

Pod Lifecycle

States

Pod Phases in Detail

Networking Model

The Four Requirements

Network Identity

Service Types

Service Discovery

Storage Architecture

The CSI Model

Storage Classes

Security Model

Defense in Depth

RBAC Model

k3s: Lightweight Kubernetes

What k3s Simplifies

k3s Architecture

Workload Patterns

Current Domus Digitalis Workloads

Additional Workloads to Consider

The Observability Stack

Mental Models

For Network Engineers

For Systems Administrators

Command Reference

Essential kubectl Commands

Cilium Commands

Helm Commands

Troubleshooting Framework

The Debugging Ladder

Common Issues and Fixes

Architecture Decision Records

ADR-001: k3s over Full Kubernetes

ADR-002: Cilium over Flannel

ADR-003: Vault External over In-Cluster

Glossary

Further Reading

See Also