802.1X Auth Failures - Resolution
Appendix
-
show logging application rabbitmq.log tail count 50
ECCRB Submission (Emergency Change Control Review Board)
Summary
ISE Primary MNT RabbitMQ Message Queue Optimization - TAC-Guided Intervention
Description
During routine monitoring of the ISE distributed deployment, elevated CPU utilization (109%) was identified on the Primary Monitoring Node (pmnt.ise.chla.org) RabbitMQ messaging service. RabbitMQ handles inter-node communication for session logging and replication.
Cisco TAC was engaged proactively (S1 - Healthcare Environment). TAC analysis confirmed message queue saturation was degrading logging pipeline performance and would progressively impact operational visibility if not addressed.
TAC recommended controlled service restart to clear accumulated queue backlog and restore optimal message processing throughput.
Business Justification
-
ISE MNT provides visibility into 802.1X authentication events for ~26,000+ endpoints
-
Degraded logging impacts security incident response capability
-
Proactive intervention prevents escalation to authentication service impact
Service Impact
None
-
Secondary MNT (smnt.ise.chla.org) provides continuous monitoring during restart
-
RADIUS authentication handled by four independent PSNs - unaffected by MNT operations
-
No endpoint connectivity impact
Detailed Implementation Plan
1. SSH to Primary MNT
- ssh pmnt.ise.chla.org
2. Initiate reload
- reload
- Prompt: "Save ade-os [y/n]" -> y
3. Wait for reboot (~5 minutes)
4. Reconnect and verify
- ssh pmnt.ise.chla.org
- Wait ~10 minutes for services to initialize
- show application status ise
- Confirm critical services running:
* Database Listener: running
* Database Server: running
* Application Server: running
* M&T Session Database: running
* M&T Log Processor: running
* ISE Messaging Service: running
5. Validate logging restored
- Log into Primary PAN (ppan.ise.chla.org)
- Navigate to Operations > RADIUS Live Logs
- Click endpoint details to confirm "No data available" error is resolved
Detailed Backout Plan
Change cannot be backed out (server reboot is atomic).
Mitigation: Secondary MNT (smnt.ise.chla.org) automatically assumes primary monitoring role if pmnt fails to recover. No authentication impact regardless of outcome.
Testing Plan
Pre-Change Validation
-
Confirm Secondary MNT (smnt.ise.chla.org) is healthy and synchronized
-
Verify PSNs (psn-1 through psn-4) are processing RADIUS authentications
-
Document current RabbitMQ CPU utilization on pmnt
Post-Change Validation
-
SSH to pmnt.ise.chla.org - confirm node is accessible
-
Run
show application status ise- all critical services running -
Run
show logging application rabbitmq.log tail count 50- no errors -
Log into Primary PAN (ppan.ise.chla.org)
-
Navigate to Operations > RADIUS Live Logs
-
Click endpoint details on 3+ recent authentications - confirm data is available (no "No data available" error)
-
Verify replication status: Administration > System > Deployment
-
Confirm no alarms on dashboard
Success Criteria
-
Primary MNT services running
-
RabbitMQ CPU < 50%
-
Live Logs session details displaying correctly
-
No replication alarms
Benefits of Change
-
Restore RabbitMQ message processing to nominal throughput
-
Ensure continuous security event visibility for compliance and incident response
-
Prevent progressive degradation that could impact authentication diagnostics
-
Align with Cisco TAC best practices for ISE MNT health maintenance
Risk Analysis / Mitigation Plan
| Risk | Probability | Mitigation |
|---|---|---|
Temporary loss of primary logging during restart |
Expected (by design) |
Secondary MNT assumes logging automatically; no data loss |
Primary MNT fails to recover |
Low |
Secondary MNT continues operation; TAC on standby for escalation |
Authentication impact |
None |
MNT is monitoring-only; PSNs operate independently |
Evidence / Justification
Observed Condition - Primary MNT CLI (pmnt.ise.chla.org)
RabbitMQ messaging service at 109% CPU utilization. Node unable to process authentication/session logs, causing backlog.
-
Command:
top/show application status ise -
Finding: RabbitMQ process consuming >100% CPU
Observed Condition - Primary PAN Dashboard (ppan.ise.chla.org)
Alarms displayed:
-
"Replication Failed from PMnT"
-
MNT sync status showing degraded state
Location: Administration > System > Deployment
Finding: Primary MNT replication failing to secondary
Observed Condition - RADIUS Live Logs
Live Logs show authentication entries, but clicking session details returns:
No data available for this record. Either the data is purged or authentication for this session record happened a week ago.
Root Cause: RabbitMQ overload preventing session data from being written to MNT database.
TAC Engagement
-
SR Number: add your case number
-
Severity: S1 (Medical Facility - Patient Care Impact)
-
TAC Engineer: add name if available
-
Recommendation: Immediate reboot of Primary MNT to clear RabbitMQ queue
Authorization
Change authorized by Cisco TAC guidance via live WebEx session.
References
-
ISE 802.1X Troubleshooting Guide: www.cisco.com/c/en/us/support/docs/security/identity-services-engine/215239-ise-wireless-802-1x-eap-tls-deployment-a.html
-
EAP-TLS Failure Reasons: www.cisco.com/c/en/us/support/docs/security/identity-services-engine/116528-configure-product-00.html
Notes
scratch space for case notes