TAC Case: 802.1X Authentication Failures (~500 Endpoints)
Case Summary
SR Number |
pending |
Severity |
S1 (production network down - medical facility) |
Product |
Cisco ISE 3.2 Patch 6 |
Contract |
add SmartNet contract ID |
Opened |
2026-03-12 |
Request |
Live engineer with distributed ISE/MNT experience |
Problem Statement
Approximately 500 endpoints are failing 802.1X authentication across wired and wireless networks. Affects domain-joined Windows and Jamf-managed Macs using EAP-TEAP, MSCHAPv2, EAP-TLS, and SCEP-issued certificates.
CRITICAL: This is a medical facility. Patient care systems may be impacted.
SECONDARY SYMPTOM: Live Logs show authentication entries, but clicking details returns:
No data available for this record. Either the data is purged or authentication for this session record happened a week ago. Or if this is a 'PassiveID' or 'PassiveID Visibility' session, it will not have authentication details on ISE.
PassiveID/Visibility services are NOT enabled. This suggests MNT database or replication issue.
Environment
ISE Deployment
| Node | Role | Status |
|---|---|---|
ppan.ise.chla.org |
Primary PAN |
check |
span.ise.chla.org |
Secondary PAN |
check |
pmnt.ise.chla.org |
Primary MNT |
check |
smnt.ise.chla.org |
Secondary MNT |
check |
psn-1.ise.chla.org |
PSN |
check |
psn-2.ise.chla.org |
PSN |
check |
psn-3.ise.chla.org |
PSN |
check |
psn-4.ise.chla.org |
PSN |
check |
ISE Version: 3.2 Patch 6
Deployment Type: Distributed (8 nodes)
Affected Networks
-
Wired (LAN) - 802.1X
-
Wireless (WLAN) - 802.1X
Affected Endpoints
| Device Type | Auth Method | ~Count |
|---|---|---|
Windows 10/11 (Domain Joined) |
EAP-TEAP, MSCHAPv2, EAP-TLS |
estimate |
macOS (Jamf Managed) |
EAP-TLS (SCEP certs) |
estimate |
WOWs (Wyse on Wheels) |
confirm auth method |
estimate |
Chromebooks |
confirm auth method |
estimate |
| WOWs and Chromebooks are critical to patient care. These devices are used at bedside for clinical workflows. |
RADIUS Architecture (NetScaler Load Balancing)
| VIP | Backend PSNs | Usage |
|---|---|---|
VIP-1 (NetScaler SNIP) |
psn-1.ise.chla.org, psn-2.ise.chla.org |
Primary for all NADs (except ASA) |
VIP-2 (NetScaler SNIP) |
psn-3.ise.chla.org, psn-4.ise.chla.org |
Secondary for all NADs (except ASA) |
-
NADs configured: Primary = VIP-1, Secondary = VIP-2
-
ASA uses direct PSN addressing (not behind VIP)
Timeline
| Date/Time | Event |
|---|---|
~2026-03-05 |
Noticed lack of logs / logging anomalies (1 week ago) |
2026-03-11 |
Authentication failures reported (~500 endpoints) |
2026-03-12 |
TAC case opened |
investigate |
Any changes in the 2 weeks before 03-05? (patch, cert renewal, DB maintenance, replication changes) |
Key observation: Logging issues preceded auth failures by ~6 days. These are likely related.
Symptoms
User Experience
-
Endpoints fail to connect to network
-
Previously working devices now failing
-
add specific error messages users see
ISE Live Logs
Failure Reason(s): check Operations > Live Logs
# Common failure reasons to look for: # 12514 - EAP-TLS failed SSL/TLS handshake # 12308 - Client certificate chain not trusted # 22056 - Subject not found in identity store # 24408 - User/machine not found in AD # 24415 - Could not locate AD domain
Sample Failed Authentications:
| Timestamp | Username/MAC | Auth Method | PSN | Failure Reason |
|---|---|---|---|---|
sample 1 |
||||
sample 2 |
||||
sample 3 |
Pattern Analysis
-
Failures on ALL PSNs or specific PSN?
-
Failures started suddenly or gradual increase?
-
Specific SSID/switch affected?
-
Time-based pattern?
Current Workaround
Adding devices by MAC address CSV import to General-Device-Onboard identity group.
This identity group is referenced in an authorization policy positioned before the default guest rule as a safety net.
Impact: Manual process, not scalable for 500+ devices.
Diagnostic Data
Support Bundle
# Generate from Primary PAN GUI: # Administration > System > Logging > Debug Log Configuration # Set debug levels BEFORE reproducing, then generate bundle Support bundle generated: [ ] Yes [ ] No Bundle filename: ise-support-bundle-YYYY-MM-DD.tar.gz
Debug Logs to Enable
Before reproducing the issue, enable these debugs on the failing PSN:
| Component | Level |
|---|---|
runtime-AAA |
DEBUG |
eap |
DEBUG |
eap-tls |
DEBUG |
ad-connector |
DEBUG |
identity-store-AD |
DEBUG |
Show Commands (from ISE CLI)
# Run on each PSN show application status ise show logging application ise-psc.log tail count 100 # AD connectivity test aaa group <AD-join-point> <test-user> <password>
MNT Replication Health (CRITICAL - check this first)
From Primary PAN GUI:
-
Administration > System > Deployment - check node sync status
-
Administration > System > Settings > Logging > Log Collector - verify pmnt/smnt status
From MNT CLI (pmnt.ise.chla.org):
# Database status show application status ise # Check replication application configure ise # Select option 24: View DB Replication Status # Disk space (if DB is full, session data won't write) show disk
If replication is broken or DB is full, that explains both symptoms.
Session Data Check
# From MNT CLI - check if session database is responding application configure ise # Select option 14: Purge Runtime Sessions # (DO NOT purge - just see if it responds)
Working Theory
The "no data available" error for recent sessions + auth failures suggests:
-
MNT database issue - session data not being written or replicated
-
Disk space exhaustion - DB partition full, can’t write new records
-
Replication failure - PSNs can’t sync session data to MNT
-
Database corruption - requires TAC intervention
The logging issue appearing ~6 days before auth failures suggests a DB/storage problem that gradually worsened until it began affecting live authentications.
What TAC Will Ask
Be ready with:
-
[ ] SmartNet contract number
-
[ ] ISE version: 3.2 Patch 6
-
[ ] When logging issues started: ~2026-03-05
-
[ ] When auth failures started: 2026-03-11
-
[ ] Any changes in past 2 weeks? (patches, certs, AD changes, VM snapshots)
-
[ ] Output of
show diskfrom each MNT node -
[ ] Output of
show application status isefrom each node -
[ ] DB replication status (option 24 from
application configure ise) -
[ ] Support bundle from Primary PAN
TAC Communication Log
| Date | Who | Notes |
|---|---|---|
2026-03-12 |
your name |
Case opened |
References
-
ISE 802.1X Troubleshooting Guide: www.cisco.com/c/en/us/support/docs/security/identity-services-engine/215239-ise-wireless-802-1x-eap-tls-deployment-a.html
-
EAP-TLS Failure Reasons: www.cisco.com/c/en/us/support/docs/security/identity-services-engine/116528-configure-product-00.html
Notes
TAC Engagement — 2026‑03‑12 15:08
Below is an organized summary of TAC observations and all recommendations, with accurate tracking sections for Action, Owner, Status, and Next Steps.
Subject: ISE 3.2P6 - 802.1X Auth Failures ~500 Endpoints - Medical Facility - MNT Session Data Unavailable
Description:
ENVIRONMENT - ISE 3.2 Patch 6, 8-node distributed deployment - PAN: ppan.ise.chla.org, span.ise.chla.org - MNT: pmnt.ise.chla.org, smnt.ise.chla.org - PSN: psn-1 through psn-4 behind NetScaler VIPs (SNIP) - NADs point to VIP-1 (psn-1/2) primary, VIP-2 (psn-3/4) secondary
PROBLEM ~500 endpoints failing 802.1X authentication across wired and wireless networks.
Affected devices: - Windows 10/11 domain-joined (EAP-TEAP, MSCHAPv2, EAP-TLS) - macOS Jamf-managed (EAP-TLS via SCEP) - WOWs (Wyse on Wheels) - CRITICAL FOR PATIENT CARE - Chromebooks - CRITICAL FOR PATIENT CARE
TIMELINE - ~Mar 5: Noticed logging anomalies / lack of logs - Mar 11: Authentication failures reported (~500 endpoints) - Mar 12: TAC case opened
SECONDARY SYMPTOM Live Logs show authentication entries, but clicking details returns: "No data available for this record. Either the data is purged or authentication for this session record happened a week ago."
PassiveID is NOT enabled. This appears to be MNT database or replication issue.
CURRENT WORKAROUND Adding devices by MAC CSV import to General-Device-Onboard identity group (not scalable).
REQUEST Live engineer with distributed ISE/MNT experience. This is a medical facility with patient care impact.
scratch space for case notes
API Diagnostic Commands (netapi)
Run these before/during TAC call to have data ready.
Deployment Status
# Node overview
netapi ise api info
# All nodes with roles/services
netapi ise -f json api-call openapi GET "/api/v1/deployment/node" | jq -r '["HOSTNAME","IP","ROLES","SERVICES","STATUS"], (.response[] | [.hostname, .ipAddress, (.roles|join("/")), (.services|join(",")), .nodeStatus]) | @tsv' | column -t
MNT Health Check
# Check MNT node status specifically
netapi ise -f json api-call openapi GET "/api/v1/deployment/node" | jq '.response[] | select(.roles[] | contains("MNT")) | {hostname, ipAddress, status: .nodeStatus, services}'
Recent Auth Failures (Live Logs via API)
# Last 24 hours failed authentications
netapi ise -f json mnt failures --hours 24 | jq -r '.[] | [.timestamp, .username, .nas_ip, .failure_reason] | @tsv' | head -100
# Group failures by reason code
netapi ise -f json mnt failures --hours 24 | jq -r '.[].failure_reason' | sort | uniq -c | sort -rn
# Group failures by PSN (which PSN is seeing failures?)
netapi ise -f json mnt failures --hours 24 | jq -r '.[].psn' | sort | uniq -c | sort -rn
Active Sessions
# Current session count per PSN
netapi ise -f json mnt sessions | jq -r 'group_by(.psn) | .[] | {psn: .[0].psn, count: length}'
# Total active sessions
netapi ise -f json mnt sessions | jq 'length'
Policy Sets
# List authentication policy sets
netapi ise policy-sets
# Check the General-Device-Onboard identity group (workaround)
netapi ise -f json identity-groups | jq '.[] | select(.name | contains("General-Device-Onboard"))'
AD Connectivity
# AD join point status
netapi ise -f json api-call openapi GET "/api/v1/active-directory" | jq '.response[] | {name, domain, status: .adJoinPointStatus}'
Export Full Config (for TAC upload)
# Dump deployment info to JSON
netapi ise export > /tmp/ise-config-$(date +%Y%m%d).json
1. TAC Initial Observations
1.1 Disabled ISE Messaging Services
TAC observed that ISE Messaging Service for UDP syslog delivery to MNT is disabled. .Location ppan.ise.chla.org/admin/#administration/administration_system/administration_system_logging/local_log .Setting ISE Messaging Settings
Use "ISE Messaging Service" for UDP Syslogs delivery to MnT
Impact: If disabled, PSNs may fail to send session/auth records to MNT, contributing to “No data available for this record” errors.
2. TAC Recommendations Tracking
Recommendation |
Description |
Owner |
Status |
Notes / Next Steps |
Enable ISE Messaging Services |
Turn on “Use ISE Messaging Service for UDP syslogs delivery to MnT”. |
InfoSec Engineering |
Pending |
Must be enabled during maintenance window; confirm PSN → MNT log ingestion resumes. |
Resolve MNT Replication Failure |
PAN dashboard shows alarms: Replication Failed from PMnT. Deregister/re‑register affected nodes. |
InfoSec Engineering + TAC |
In Progress |
Perform on both PMnT and SMnT. Validate DB state & cluster hashing before re-registration. |
Promote SMnT to Primary MNT |
TAC recommends promoting secondary MNT to primary role temporarily. |
InfoSec Engineering |
Pending Decision |
Requires validation of replication health and disk space. Ensure no corruption on SMnT. |
Upgrade to ISE 3.2 Patch 9 |
TAC recommends installing latest patch to address known replication and logging issues. |
InfoSec Engineering |
Pending |
Download link: software.cisco.com/download/home/283801620/type/283802505/release/3.2%20Patch%209 |
Review Disk Space on PMnT + SMnT |
Verify DB/log partitions; full partitions can break logging and replication. |
InfoSec Engineering |
In Progress |
Capture from CLI: show disk |
Validate ISE Node Sync Status |
Ensure deployment sync and configuration database replication are functioning. |
InfoSec Engineering |
Pending |
GUI: Admin → System → Deployment |
3. Supporting TAC Documentation
Links provided by TAC:
4. Scratch Space (Working Notes)
(Keep this for live call notes, timestamps, commands run, replication output, disk output, etc.)
ISE Primary MNT CPU rabbit mq service is over 100%
Management Summary — Primary MNT CPU / RabbitMQ Issue
We identified a critical performance issue on the Primary Monitoring Node (MNT) within our Cisco ISE deployment. The RabbitMQ messaging service, which is responsible for processing authentication and session logs, is running at over 100% CPU. This indicates that the MNT is unable to process messages efficiently, causing backlog and instability in the logging and monitoring functions. Recommended Action Cisco TAC has advised us to reboot the Primary MNT to clear the overloaded messaging service. During this reboot:
The Secondary MNT will automatically take over all monitoring/logging responsibilities. There is no impact to user authentication or network access. All authentication is handled by the four Policy Service Nodes (PSNs), which remain fully operational.
Why This Matters The overloaded Primary MNT is contributing to the issues we are seeing with missing log data and failed session lookups. Addressing this is part of stabilizing the overall environment and restoring full visibility into authentication events. Next Steps
After reboot, validate replication, queue processing, and log ingestion. Continue working with Cisco TAC to assess whether additional corrective actions are require
- [ ] app ise stop - [ ] reboot - [ ] saved ade-os - [ ] acknowledged reboot - [ ] ssh'd into server at 2026-03-12 16:21 - [ ] pmnt/admin#show uptime 16:21:26 up 3 min, 1 user, load average: 2.77, 1.35, 0.53 - [ ] 2026-03-12 16:29 running show application status ise ISE PROCESS NAME STATE PROCESS ID -------------------------------------------------------------------- Database Listener running 8851 Database Server running 300 PROCESSES Application Server running 28749 Profiler Database running 17555 ISE Indexing Engine disabled AD Connector running 29781 M&T Session Database running 24955 M&T Log Processor running 29017 Certificate Authority Service disabled EST Service running 157854 SXP Engine Service disabled TC-NAC Service disabled PassiveID WMI Service disabled PassiveID Syslog Service disabled PassiveID API Service disabled PassiveID Agent Service disabled PassiveID Endpoint Service disabled PassiveID SPAN Service disabled DHCP Server (dhcpd) disabled DNS Server (named) disabled ISE Messaging Service running 12480 ISE API Gateway Database Service running 16233 ISE API Gateway Service running 23247 ISE pxGrid Direct Service disabled Segmentation Policy Service disabled REST Auth Service running 145209 SSE Connector disabled Hermes (pxGrid Cloud Agent) disabled McTrust (Meraki Sync Service) disabled ISE Node Exporter running 48340 ISE Prometheus Service disabled ISE Grafana Service disabled ISE MNT LogAnalytics Elasticsearch running 57535 ISE Logstash Service running 75054 ISE Kibana Service running 92228 - [ ] [scratch area for case log / CLI findings]
Appendix
-
show logging application rabbitmq.log tail count 50
ECCRB Submission (Emergency Change Control Review Board)
Summary
ISE Primary MNT RabbitMQ Message Queue Optimization - TAC-Guided Intervention
Description
During routine monitoring of the ISE distributed deployment, elevated CPU utilization (109%) was identified on the Primary Monitoring Node (pmnt.ise.chla.org) RabbitMQ messaging service. RabbitMQ handles inter-node communication for session logging and replication.
Cisco TAC was engaged proactively (S1 - Healthcare Environment). TAC analysis confirmed message queue saturation was degrading logging pipeline performance and would progressively impact operational visibility if not addressed.
TAC recommended controlled service restart to clear accumulated queue backlog and restore optimal message processing throughput.
Business Justification
-
ISE MNT provides visibility into 802.1X authentication events for ~26,000+ endpoints
-
Degraded logging impacts security incident response capability
-
Proactive intervention prevents escalation to authentication service impact
Service Impact
None
-
Secondary MNT (smnt.ise.chla.org) provides continuous monitoring during restart
-
RADIUS authentication handled by four independent PSNs - unaffected by MNT operations
-
No endpoint connectivity impact
Detailed Implementation Plan
1. SSH to Primary MNT
- ssh pmnt.ise.chla.org
2. Initiate reload
- reload
- Prompt: "Save ade-os [y/n]" → y
3. Wait for reboot (~5 minutes)
4. Reconnect and verify
- ssh pmnt.ise.chla.org
- Wait ~10 minutes for services to initialize
- show application status ise
- Confirm critical services running:
* Database Listener: running
* Database Server: running
* Application Server: running
* M&T Session Database: running
* M&T Log Processor: running
* ISE Messaging Service: running
5. Validate logging restored
- Log into Primary PAN (ppan.ise.chla.org)
- Navigate to Operations > RADIUS Live Logs
- Click endpoint details to confirm "No data available" error is resolved
Detailed Backout Plan
Change cannot be backed out (server reboot is atomic).
Mitigation: Secondary MNT (smnt.ise.chla.org) automatically assumes primary monitoring role if pmnt fails to recover. No authentication impact regardless of outcome.
Testing Plan
Pre-Change Validation
-
Confirm Secondary MNT (smnt.ise.chla.org) is healthy and synchronized
-
Verify PSNs (psn-1 through psn-4) are processing RADIUS authentications
-
Document current RabbitMQ CPU utilization on pmnt
Post-Change Validation
-
SSH to pmnt.ise.chla.org - confirm node is accessible
-
Run
show application status ise- all critical services running -
Run
show logging application rabbitmq.log tail count 50- no errors -
Log into Primary PAN (ppan.ise.chla.org)
-
Navigate to Operations > RADIUS Live Logs
-
Click endpoint details on 3+ recent authentications - confirm data is available (no "No data available" error)
-
Verify replication status: Administration > System > Deployment
-
Confirm no alarms on dashboard
Success Criteria
-
Primary MNT services running
-
RabbitMQ CPU < 50%
-
Live Logs session details displaying correctly
-
No replication alarms
Benefits of Change
-
Restore RabbitMQ message processing to nominal throughput
-
Ensure continuous security event visibility for compliance and incident response
-
Prevent progressive degradation that could impact authentication diagnostics
-
Align with Cisco TAC best practices for ISE MNT health maintenance
Risk Analysis / Mitigation Plan
| Risk | Probability | Mitigation |
|---|---|---|
Temporary loss of primary logging during restart |
Expected (by design) |
Secondary MNT assumes logging automatically; no data loss |
Primary MNT fails to recover |
Low |
Secondary MNT continues operation; TAC on standby for escalation |
Authentication impact |
None |
MNT is monitoring-only; PSNs operate independently |
Evidence / Justification
Observed Condition - Primary MNT CLI (pmnt.ise.chla.org)
RabbitMQ messaging service at 109% CPU utilization. Node unable to process authentication/session logs, causing backlog.
-
Command:
top/show application status ise -
Finding: RabbitMQ process consuming >100% CPU
Observed Condition - Primary PAN Dashboard (ppan.ise.chla.org)
Alarms displayed:
-
"Replication Failed from PMnT"
-
MNT sync status showing degraded state
Location: Administration > System > Deployment
Finding: Primary MNT replication failing to secondary
Observed Condition - RADIUS Live Logs
Live Logs show authentication entries, but clicking session details returns:
No data available for this record. Either the data is purged or authentication for this session record happened a week ago.
Root Cause: RabbitMQ overload preventing session data from being written to MNT database.
TAC Engagement
-
SR Number: add your case number
-
Severity: S1 (Medical Facility - Patient Care Impact)
-
TAC Engineer: add name if available
-
Recommendation: Immediate reboot of Primary MNT to clear RabbitMQ queue
Authorization
Change authorized by Cisco TAC guidance via live WebEx session.