Competencies: DevOps > Site Reliability Engineering
Site Reliability Engineering
Body of Knowledge
| Topic | Description | Relevance | Career Tracks |
|---|---|---|---|
SRE Fundamentals |
SRE principles, DevOps vs SRE, toil reduction, automation, reliability engineering culture. |
Critical |
SRE, Platform Engineer |
SLIs/SLOs/SLAs |
Service level indicators, objectives, agreements, defining reliability targets, measurement strategies. |
Critical |
SRE, Product Manager |
Error Budgets |
Error budget concept, budget-based decisions, feature velocity vs reliability tradeoff, burn rate. |
Critical |
SRE, Engineering Manager |
Incident Management |
On-call, paging, incident response, severity levels, incident commander, communication, postmortems. |
Critical |
SRE, DevOps |
Chaos Engineering |
Fault injection, Chaos Monkey, Litmus, steady state hypothesis, blast radius, game days. |
Medium |
SRE, Platform Engineer |
Capacity Planning |
Load forecasting, resource planning, autoscaling, performance testing, cost optimization. |
High |
SRE, Platform Engineer |
Runbooks and Documentation |
Operational runbooks, escalation procedures, troubleshooting guides, knowledge management. |
High |
SRE, DevOps |
Blameless Postmortems |
Postmortem process, timeline reconstruction, root cause analysis, action items, learning culture. |
Critical |
SRE, Engineering Manager |
Reliability Patterns |
Circuit breakers, retries with backoff, bulkheads, timeouts, graceful degradation, load shedding. |
High |
SRE, Backend Developer |
Production Excellence |
Production readiness reviews, launch checklists, operational maturity, handoff processes. |
High |
SRE, Platform Engineer |
Personal Status
| Topic | Level | Evidence | Active Projects | Gaps |
|---|---|---|---|---|
No personal status recorded |
— |
— |
— |
— |