STD-012: Runbook Structure
Runbooks document repeatable operational procedures with explicit commands, validation at every step, and rollback paths. A runbook is not a description of what to do — it is the exact sequence of commands an operator executes. If a step cannot be copy-pasted into a terminal, it is not a runbook step.
Principles
-
Explicit commands, not descriptions. Every step contains the exact command to run. "Configure the firewall" is not a step.
firewall-cmd --add-port=443/tcp --permanentis. -
Validation at every step. Every command that changes state MUST be followed by a verification command with documented expected output. This is the verify-change-verify pattern from STD-005.
-
Expected output documented. Operators MUST know what success looks like before they execute. Every verification includes a
.Expected Outputblock. -
Reversible with rollback. Every procedure MUST have a Rollback section that restores the system to its pre-execution state. If a procedure is not reversible, the Rollback section MUST document why and what compensating controls exist.
-
Variables for reuse. Environment-specific values (hostnames, IPs, paths) are defined once in a Variables section and referenced throughout. Runbooks are portable across environments by changing only the variables.
Naming Convention
| Prefix | Purpose | Example |
|---|---|---|
|
Standard operational procedure |
|
|
Incident response procedure |
|
|
Deployment procedure |
|
|
Disaster recovery procedure |
|
Runbook files are stored in pages/runbooks/ and named using the appropriate prefix, date, and kebab-case slug.
Required Sections
Every runbook MUST contain the following sections in this order:
| Section | Content |
|---|---|
Overview |
Brief description of the procedure, when to use it, and what it accomplishes |
Prerequisites |
Checklist of required access, permissions, tools, and pre-conditions — verified before execution begins |
Scope |
Explicit in-scope and out-of-scope boundaries so operators know where this runbook ends |
Variables |
All environment-specific values defined as shell variables at the top, with concrete examples |
Procedure (phased) |
Numbered phases, each containing numbered steps. Every step has a command block. Every mutating step has a verification with expected output |
Rollback |
Step-by-step reversal of the procedure, restoring the system to its pre-execution state |
Troubleshooting |
Common failure modes with symptom, cause, and resolution command for each |
Post-Execution Checklist |
Verification checklist confirming the procedure achieved its goals: service running, functionality tested, documentation updated, stakeholders notified |
Related |
Cross-references to related runbooks, standards, incident reports, and external documentation |
Procedure Phase Structure
Each phase within the Procedure section follows this structure:
=== Phase N: Phase Name
==== N.1 Step Name
[source,bash]
Command to execute
command --flag value
.Expected Output
output confirming success
Steps within a phase are numbered sequentially: 1.1, 1.2, 2.1, 2.2. Phases represent logical groupings: Preparation, Execution, Verification, Cleanup.
Requirements
-
Every step that modifies system state MUST have a verification command immediately following it.
-
Expected output MUST be shown in a
.Expected Outputblock after every verification command. -
Rollback MUST be documented for every procedure. If the procedure is irreversible, the Rollback section MUST state this explicitly and document compensating controls using this format:
== Rollback WARNING: This procedure is irreversible. The following compensating controls apply: * *Pre-execution backup:* [backup command and location] * *Recovery path:* [how to restore from backup if needed] * *Acceptance criteria:* [how to verify the change is acceptable if rollback is impossible] -
Variables MUST be used for all environment-specific values — hostnames, IPs, ports, paths, usernames. No hardcoded values in procedure steps.
-
The Prerequisites section MUST be a checklist (using
* [ ]) so operators can verify readiness before starting. -
Phases MUST be numbered sequentially and named descriptively.
-
NOTE, WARNING, and CAUTION admonitions MUST be used for steps with elevated risk or non-obvious behavior.
-
Runbooks SHOULD be tested in non-production before use against production systems.
Compliance
| Check | Method | Pass Criterion |
|---|---|---|
All required sections present |
Runbook contains Overview, Prerequisites, Scope, Variables, Procedure, Rollback, Troubleshooting, Post-Execution Checklist, Related |
No missing sections |
Verification at every step |
Every mutating command is followed by a verification command with |
No blind changes |
Rollback complete |
Rollback section contains explicit commands that reverse the procedure, or documents why reversal is impossible |
System can be restored |
Variables used |
No hardcoded IPs, hostnames, ports, or paths in procedure steps — all reference variables from the Variables section |
Environment-portable |
Naming correct |
Filename uses registered prefix ( |
Matches naming convention |
Prerequisites checkable |
Prerequisites are a checklist with verifiable items, not prose |
Operator can confirm readiness |
Related
-
STD-005: Change Control — verify-change-verify applies to every runbook step
-
STD-011: Incident Response — incident runbooks follow this structure plus incident-specific requirements