Disaster Recovery & Downtime Procedures

Project Summary

Project

Disaster Recovery & Downtime Procedures

Priority

P0 — management initiated

Status

Active — scoping

Owner

Evan Rosado

Stakeholders

InfoSec Management, Network Engineering, IT Operations

Objective

Document and test DR procedures for all critical infrastructure systems, starting with ISE

Risk

ISE operates in dot1x closed mode with dynamic VLAN/dACL/posture — if ISE is unavailable, endpoints cannot authenticate and are denied network access

System Priority

Priority System Impact if Down Status

1

Cisco ISE (PAN, MnT, PSN)

All dot1x/MAB authentication fails. Closed mode = no network access for any endpoint. Dynamic VLAN, dACL, and posture assignments stop.

❌ Not started

2

Firewalls (FTD/FMC, ASA)

FTDs continue with last deployed policy if FMC is lost. If FTD fails, traffic path breaks (HA failover if configured).

❌ Not started

3

Core/Distribution switches

HSRP failover (if tracking configured — see murus-portae HSRP finding). STP reconvergence. Potential broadcast storms.

❌ Not started

4

Wireless LAN Controllers

All wireless clients disconnect. AP fallback to FlexConnect local switching if configured.

❌ Not started

5

DNS/DHCP

Name resolution fails. New devices cannot get IP addresses. Existing leases continue until expiry.

❌ Not started

6

SIEM (QRadar → Sentinel)

No visibility. Events buffer on log sources until SIEM recovers.

❌ Not started

ISE DR — Key Considerations

Failure Scenarios

Scenario Impact Mitigation

All PSNs down

No RADIUS — closed mode denies all endpoints

Critical-auth VLAN on switches (AAA dead-server detection)

PAN down (single PAN)

No policy changes, no GUI, no API. Existing policy continues to function on PSNs.

Promote MnT or standby PAN if distributed deployment

MnT down

No logging, no session history, no reporting. Auth continues.

Secondary MnT or standalone recovery

Database corruption

Full outage — ISE non-functional

Scheduled config + operational backups, tested restore procedure

Certificate expiry

EAP-TLS fails for all endpoints using that cert chain

Certificate monitoring, renewal calendar

Switch-Side Failsafe

When ISE is unreachable, switches must have fallback configured:

! AAA dead-server detection
aaa server radius dynamic-author
 client <ISE_PSN_IP> server-key <key>
!
radius-server dead-criteria time 10 tries 3
radius-server deadtime 15
!
! Critical-auth VLAN — fallback when all RADIUS servers dead
authentication event server dead action authorize vlan <CRITICAL_VLAN>
authentication event server dead action authorize voice
authentication event server alive action reinitialize
Without this configuration, dot1x closed mode means total network blackout when ISE is down.

Metadata

Field Value

PRJ ID

PRJ-2026-04-disaster-recovery

Author

Evan Rosado

Created

2026-04-17

Updated

2026-04-17

Status

Active — scoping

Category

Business Continuity / Infrastructure

Priority

P0

Scope

ISE, Firewalls, Network, WLC, DNS/DHCP, SIEM

Related

PRJ-2026-04-firewall-audit, PRJ-ise-34-migration, PRJ-ise-hardware-refresh