Incident Report Template
Copy this template when creating a new incident report.
Filename convention: INC-YYYY-MM-DD-brief-description.adoc
Incident Summary
| Field | Value |
|---|---|
Detected |
YYYY-MM-DD HH:MM TZ (how detected) |
Mitigated |
YYYY-MM-DD HH:MM TZ (or N/A) |
Resolved |
YYYY-MM-DD HH:MM TZ (or ongoing) |
Duration |
X hours/days |
Severity |
P1 (Critical) / P2 (High) / P3 (Medium) / P4 (Low) |
Impact |
Brief description of what was affected |
Root Cause |
One-line root cause statement |
Timeline
| Time (TZ) | Event |
|---|---|
HH:MM |
Initial symptom observed |
HH:MM |
Alert triggered / user reported |
HH:MM |
Investigation started |
HH:MM |
Root cause identified |
HH:MM |
Mitigation applied |
HH:MM |
Verified resolved |
Symptoms
-
What was observed?
-
Error messages?
-
Failed services or processes?
-
User reports?
Investigation
Initial Triage
# First diagnostic commands run
# Example: systemctl status <service>
Log Analysis
# Log queries
# Example: journalctl -u <service> --since "today"
Findings
-
Finding 1 - what was discovered
-
Finding 2 - what led to root cause
-
Finding 3 - contributing factors
Root Cause
Technical explanation: One paragraph explaining why the incident occurred.
Why it happened:
-
Immediate cause: [what failed]
-
Contributing factors: [what made it worse or allowed it to happen]
-
Systemic issues: [underlying problems]
Resolution
Immediate Fix
# Commands used to resolve the incident
Verification
# Commands used to verify the fix
-
Service restored
-
Monitoring shows healthy
-
No new errors in logs
-
Users confirmed resolution
Impact Assessment
Systems Affected
| System | Status | Impact Duration |
|---|---|---|
System 1 |
Restored |
X hours |
System 2 |
N/A |
- |
Business Impact
-
Users affected: [count or percentage]
-
Data loss: Yes / No - details
-
Compliance implications: [if any]
-
External visibility: [customer-facing?]
Prevention
Short-term (This Week)
-
Action 1 - Owner
-
Action 2 - Owner
Long-term (This Quarter)
-
Systemic improvement 1 - Owner
-
Process change 1 - Owner
-
Monitoring enhancement - Owner
Lessons Learned
What Went Well
-
Item 1
-
Item 2
What Could Be Improved
-
Item 1
-
Item 2
Key Takeaways
|
Communication Log
| Time | Audience | Message |
|---|---|---|
HH:MM |
Team/Management |
Initial notification |
HH:MM |
Stakeholders |
Status update |
HH:MM |
All |
Resolution notification |
Related
-
Change Request:
CR-YYYY-MM-DD-description.adoc(link to related CR) -
RCA:
RCA-YYYY-MM-DD-NNN.adoc(link to detailed RCA if P1/P2) -
Runbook: Link to relevant runbook
-
Monitoring: Link to dashboard/alert
Metadata
| Field | Value |
|---|---|
Incident ID |
INC-YYYY-MM-DD-NNN |
Author |
Name |
Created |
YYYY-MM-DD |
Last Updated |
YYYY-MM-DD |
Status |
Draft / In Review / Final |
Post-Incident Review |
YYYY-MM-DD (within 5 business days for P1/P2) |
Severity Definitions
| Severity | Criteria | Response Time |
|---|---|---|
P1 - Critical |
Production down, data loss, security breach |
Immediate, all hands |
P2 - High |
Major functionality impaired, workaround difficult |
Within 1 hour, dedicated team |
P3 - Medium |
Functionality degraded, workaround available |
Within 4 hours, normal priority |
P4 - Low |
Minor issue, cosmetic, no user impact |
Next business day |