Competencies: DevOps > Site Reliability Engineering

Site Reliability Engineering

Body of Knowledge

Topic	Description	Relevance	Career Tracks
SRE Fundamentals	SRE principles, DevOps vs SRE, toil reduction, automation, reliability engineering culture.	Critical	SRE, Platform Engineer
SLIs/SLOs/SLAs	Service level indicators, objectives, agreements, defining reliability targets, measurement strategies.	Critical	SRE, Product Manager
Error Budgets	Error budget concept, budget-based decisions, feature velocity vs reliability tradeoff, burn rate.	Critical	SRE, Engineering Manager
Incident Management	On-call, paging, incident response, severity levels, incident commander, communication, postmortems.	Critical	SRE, DevOps
Chaos Engineering	Fault injection, Chaos Monkey, Litmus, steady state hypothesis, blast radius, game days.	Medium	SRE, Platform Engineer
Capacity Planning	Load forecasting, resource planning, autoscaling, performance testing, cost optimization.	High	SRE, Platform Engineer
Runbooks and Documentation	Operational runbooks, escalation procedures, troubleshooting guides, knowledge management.	High	SRE, DevOps
Blameless Postmortems	Postmortem process, timeline reconstruction, root cause analysis, action items, learning culture.	Critical	SRE, Engineering Manager
Reliability Patterns	Circuit breakers, retries with backoff, bulkheads, timeouts, graceful degradation, load shedding.	High	SRE, Backend Developer
Production Excellence	Production readiness reviews, launch checklists, operational maturity, handoff processes.	High	SRE, Platform Engineer

Topic

Description

Relevance

Career Tracks

SRE Fundamentals

SRE principles, DevOps vs SRE, toil reduction, automation, reliability engineering culture.

Critical

SRE, Platform Engineer

SLIs/SLOs/SLAs

Service level indicators, objectives, agreements, defining reliability targets, measurement strategies.

Critical

SRE, Product Manager

Error Budgets

Error budget concept, budget-based decisions, feature velocity vs reliability tradeoff, burn rate.

Critical

SRE, Engineering Manager

Incident Management

On-call, paging, incident response, severity levels, incident commander, communication, postmortems.

Critical

SRE, DevOps

Chaos Engineering

Fault injection, Chaos Monkey, Litmus, steady state hypothesis, blast radius, game days.

Medium

SRE, Platform Engineer

Capacity Planning

Load forecasting, resource planning, autoscaling, performance testing, cost optimization.

High

SRE, Platform Engineer

Runbooks and Documentation

Operational runbooks, escalation procedures, troubleshooting guides, knowledge management.

High

SRE, DevOps

Blameless Postmortems

Postmortem process, timeline reconstruction, root cause analysis, action items, learning culture.

Critical

SRE, Engineering Manager

Reliability Patterns

Circuit breakers, retries with backoff, bulkheads, timeouts, graceful degradation, load shedding.

High

SRE, Backend Developer

Production Excellence

Production readiness reviews, launch checklists, operational maturity, handoff processes.

High

SRE, Platform Engineer

Personal Status

Topic	Level	Evidence	Active Projects	Gaps
No personal status recorded	—	—	—	—

Topic

Level

Evidence

Active Projects

Gaps

No personal status recorded

—