TAC Case: 802.1X Authentication Failures (~500 Endpoints)

Case Summary

SR Number

pending

Severity

S1 (production network down - medical facility)

Product

Cisco ISE 3.2 Patch 6

Contract

add SmartNet contract ID

Opened

2026-03-12

Request

Live engineer with distributed ISE/MNT experience

Problem Statement

Approximately 500 endpoints are failing 802.1X authentication across wired and wireless networks. Affects domain-joined Windows and Jamf-managed Macs using EAP-TEAP, MSCHAPv2, EAP-TLS, and SCEP-issued certificates.

CRITICAL: This is a medical facility. Patient care systems may be impacted.

SECONDARY SYMPTOM: Live Logs show authentication entries, but clicking details returns:

No data available for this record. Either the data is purged or authentication for this session record happened a week ago. Or if this is a 'PassiveID' or 'PassiveID Visibility' session, it will not have authentication details on ISE.

PassiveID/Visibility services are NOT enabled. This suggests MNT database or replication issue.

Environment

ISE Deployment

Node Role Status

ppan.ise.chla.org

Primary PAN

check

span.ise.chla.org

Secondary PAN

check

pmnt.ise.chla.org

Primary MNT

check

smnt.ise.chla.org

Secondary MNT

check

psn-1.ise.chla.org

PSN

check

psn-2.ise.chla.org

PSN

check

psn-3.ise.chla.org

PSN

check

psn-4.ise.chla.org

PSN

check

ISE Version: 3.2 Patch 6

Deployment Type: Distributed (8 nodes)

Affected Networks

  • Wired (LAN) - 802.1X

  • Wireless (WLAN) - 802.1X

Affected Endpoints

Device Type Auth Method ~Count

Windows 10/11 (Domain Joined)

EAP-TEAP, MSCHAPv2, EAP-TLS

estimate

macOS (Jamf Managed)

EAP-TLS (SCEP certs)

estimate

WOWs (Wyse on Wheels)

confirm auth method

estimate

Chromebooks

confirm auth method

estimate

WOWs and Chromebooks are critical to patient care. These devices are used at bedside for clinical workflows.

RADIUS Architecture (NetScaler Load Balancing)

VIP Backend PSNs Usage

VIP-1 (NetScaler SNIP)

psn-1.ise.chla.org, psn-2.ise.chla.org

Primary for all NADs (except ASA)

VIP-2 (NetScaler SNIP)

psn-3.ise.chla.org, psn-4.ise.chla.org

Secondary for all NADs (except ASA)

  • NADs configured: Primary = VIP-1, Secondary = VIP-2

  • ASA uses direct PSN addressing (not behind VIP)

Timeline

Date/Time Event

~2026-03-05

Noticed lack of logs / logging anomalies (1 week ago)

2026-03-11

Authentication failures reported (~500 endpoints)

2026-03-12

TAC case opened

investigate

Any changes in the 2 weeks before 03-05? (patch, cert renewal, DB maintenance, replication changes)

Key observation: Logging issues preceded auth failures by ~6 days. These are likely related.

Symptoms

User Experience

  • Endpoints fail to connect to network

  • Previously working devices now failing

  • add specific error messages users see

ISE Live Logs

Failure Reason(s): check Operations > Live Logs

# Common failure reasons to look for:
# 12514 - EAP-TLS failed SSL/TLS handshake
# 12308 - Client certificate chain not trusted
# 22056 - Subject not found in identity store
# 24408 - User/machine not found in AD
# 24415 - Could not locate AD domain

Sample Failed Authentications:

Timestamp Username/MAC Auth Method PSN Failure Reason

sample 1

sample 2

sample 3

Pattern Analysis

  • Failures on ALL PSNs or specific PSN?

  • Failures started suddenly or gradual increase?

  • Specific SSID/switch affected?

  • Time-based pattern?

Current Workaround

Adding devices by MAC address CSV import to General-Device-Onboard identity group.

This identity group is referenced in an authorization policy positioned before the default guest rule as a safety net.

Impact: Manual process, not scalable for 500+ devices.

Working Theory

The "no data available" error for recent sessions + auth failures suggests:

  1. MNT database issue - session data not being written or replicated

  2. Disk space exhaustion - DB partition full, can’t write new records

  3. Replication failure - PSNs can’t sync session data to MNT

  4. Database corruption - requires TAC intervention

The logging issue appearing ~6 days before auth failures suggests a DB/storage problem that gradually worsened until it began affecting live authentications.

What TAC Will Ask

Be ready with:

  1. [ ] SmartNet contract number

  2. [ ] ISE version: 3.2 Patch 6

  3. [ ] When logging issues started: ~2026-03-05

  4. [ ] When auth failures started: 2026-03-11

  5. [ ] Any changes in past 2 weeks? (patches, certs, AD changes, VM snapshots)

  6. [ ] Output of show disk from each MNT node

  7. [ ] Output of show application status ise from each node

  8. [ ] DB replication status (option 24 from application configure ise)

  9. [ ] Support bundle from Primary PAN