Disaster Recovery & Downtime Procedures
Project Summary
Project |
Disaster Recovery & Downtime Procedures |
Priority |
P0 — management initiated |
Status |
Active — scoping |
Owner |
Evan Rosado |
Stakeholders |
InfoSec Management, Network Engineering, IT Operations |
Objective |
Document and test DR procedures for all critical infrastructure systems, starting with ISE |
Risk |
ISE operates in dot1x closed mode with dynamic VLAN/dACL/posture — if ISE is unavailable, endpoints cannot authenticate and are denied network access |
System Priority
| Priority | System | Impact if Down | Status |
|---|---|---|---|
1 |
Cisco ISE (PAN, MnT, PSN) |
All dot1x/MAB authentication fails. Closed mode = no network access for any endpoint. Dynamic VLAN, dACL, and posture assignments stop. |
❌ Not started |
2 |
Firewalls (FTD/FMC, ASA) |
FTDs continue with last deployed policy if FMC is lost. If FTD fails, traffic path breaks (HA failover if configured). |
❌ Not started |
3 |
Core/Distribution switches |
HSRP failover (if tracking configured — see murus-portae HSRP finding). STP reconvergence. Potential broadcast storms. |
❌ Not started |
4 |
Wireless LAN Controllers |
All wireless clients disconnect. AP fallback to FlexConnect local switching if configured. |
❌ Not started |
5 |
DNS/DHCP |
Name resolution fails. New devices cannot get IP addresses. Existing leases continue until expiry. |
❌ Not started |
6 |
SIEM (QRadar → Sentinel) |
No visibility. Events buffer on log sources until SIEM recovers. |
❌ Not started |
ISE DR — Key Considerations
Failure Scenarios
| Scenario | Impact | Mitigation |
|---|---|---|
All PSNs down |
No RADIUS — closed mode denies all endpoints |
Critical-auth VLAN on switches (AAA dead-server detection) |
PAN down (single PAN) |
No policy changes, no GUI, no API. Existing policy continues to function on PSNs. |
Promote MnT or standby PAN if distributed deployment |
MnT down |
No logging, no session history, no reporting. Auth continues. |
Secondary MnT or standalone recovery |
Database corruption |
Full outage — ISE non-functional |
Scheduled config + operational backups, tested restore procedure |
Certificate expiry |
EAP-TLS fails for all endpoints using that cert chain |
Certificate monitoring, renewal calendar |
Switch-Side Failsafe
When ISE is unreachable, switches must have fallback configured:
! AAA dead-server detection
aaa server radius dynamic-author
client <ISE_PSN_IP> server-key <key>
!
radius-server dead-criteria time 10 tries 3
radius-server deadtime 15
!
! Critical-auth VLAN — fallback when all RADIUS servers dead
authentication event server dead action authorize vlan <CRITICAL_VLAN>
authentication event server dead action authorize voice
authentication event server alive action reinitialize
| Without this configuration, dot1x closed mode means total network blackout when ISE is down. |
Metadata
| Field | Value |
|---|---|
PRJ ID |
PRJ-2026-04-disaster-recovery |
Author |
Evan Rosado |
Created |
2026-04-17 |
Updated |
2026-04-17 |
Status |
Active — scoping |
Category |
Business Continuity / Infrastructure |
Priority |
P0 |
Scope |
ISE, Firewalls, Network, WLC, DNS/DHCP, SIEM |
Related |
PRJ-2026-04-firewall-audit, PRJ-ise-34-migration, PRJ-ise-hardware-refresh |
Related
-
Firewall Audit — FTD/FMC architecture and gaps
-
Murus Portae — HSRP tracking gap finding
-
ISE 3.4 Migration — upgrade path affects DR
-
ISE Hardware Refresh — node capacity