Incident Response Template
Template
= [INCIDENT-YYYY-MM-DD] Brief Description
:description: Incident summary
:severity: P1 / P2 / P3 / P4
:status: investigating / identified / monitoring / resolved
:revdate: YYYY-MM-DD
== Incident Summary
[cols="1,2"]
|===
| Field | Value
| Detected
| YYYY-MM-DD HH:MM TZ
| Resolved
| YYYY-MM-DD HH:MM TZ (or "Ongoing")
| Duration
| X hours Y minutes
| Severity
| P1 (Critical) / P2 (High) / P3 (Medium) / P4 (Low)
| Impact
| Brief description of user/business impact
| Root Cause
| TBD / Brief description
|===
== Timeline
[cols="1,3"]
|===
| Time | Event
| HH:MM
| Issue first reported / detected
| HH:MM
| Initial investigation started
| HH:MM
| Root cause identified
| HH:MM
| Fix implemented
| HH:MM
| Monitoring confirmed resolution
|===
== Symptoms
What did users/systems experience?
* Symptom 1
* Symptom 2
* Symptom 3
== Investigation
=== Initial Triage
* [ ] Check monitoring dashboards
* [ ] Review recent changes (deploys, config changes)
* [ ] Check dependent services
* [ ] Review error logs
=== Diagnostic Commands
```bash
# Check service status
systemctl status <service>
# View recent logs
journalctl -u <service> --since "1 hour ago"
# Check connectivity
ping <host>
curl -v <endpoint>
```
=== Findings
Document what you discovered during investigation.
== Root Cause
Explain the underlying cause of the incident.
== Resolution
=== Immediate Fix
What was done to restore service?
```bash
# Commands executed
```
=== Verification
* [ ] Service responding normally
* [ ] Monitoring shows green
* [ ] Users confirmed functionality restored
* [ ] No error spikes in logs
== Prevention
=== Short-term
* [ ] Action item 1
* [ ] Action item 2
=== Long-term
* [ ] Action item 1
* [ ] Action item 2
== Lessons Learned
* What went well?
* What could be improved?
* What will we do differently?
== Related
* Link to monitoring dashboard
* Link to runbook used
* Link to related incidents
Severity Levels
| Level | Description | Response Time |
|---|---|---|
P1 - Critical |
Complete outage, data loss risk, security breach |
Immediate, all hands |
P2 - High |
Major feature broken, significant user impact |
< 1 hour |
P3 - Medium |
Minor feature broken, workaround available |
< 4 hours |
P4 - Low |
Cosmetic, minor inconvenience |
Next business day |
Communication Templates
Initial Notification
Subject: [INCIDENT] Brief description - Investigating We are aware of an issue affecting [service/feature]. Impact: [description] Status: Investigating We will provide updates every [30 minutes / 1 hour].
Resolution Notification
Subject: [RESOLVED] Brief description The issue affecting [service/feature] has been resolved. Duration: X hours Y minutes Root Cause: [brief explanation] Resolution: [what was done] We apologize for any inconvenience.