MSCHAPv2 Migration: Data Shape Analysis
Complete data shape analysis of 7 data files across two ISE environments (CHLA production and personal lab). This document establishes the data foundation for migration planning, analytics dashboards, and statistical analysis.
1. Context
The MSCHAPv2 to EAP-TLS/EAP-TEAP migration affects 6,084 unique endpoints in CHLA production. Before building analytics tooling, migration reports, or dashboards, we need a thorough understanding of the available data: schemas, data types, value distributions, nullability patterns, and cross-dataset relationships.
Two ISE environments provide data:
-
CHLA Production — 2484-sourced DataConnect export of all MSCHAPv2-authenticated endpoints over a 30-day window
-
Personal ISE Lab (ise-01.inside.domusdigitalis.dev) — Endpoint inventory from ISE DataConnect, used for tooling development and profiling experimentation
2. Objectives
-
Count total rows and identify all columns for every data file
-
Determine data types, nullability, and sample values per column
-
Count unique values for categorical fields
-
Find min/max for numeric fields
-
Identify date ranges for timestamp fields
-
Map relationships between datasets
-
Define a unified schema for cross-environment analytics
-
Identify migration readiness signals in the CHLA data
3. Data Inventory
Seven data files across two ISE environments were analyzed. No files were modified during this assessment.
3.1. Environment A: CHLA Production (MSCHAPv2 Migration)
| File | Rows | Cols | Delimiter | Notes |
|---|---|---|---|---|
|
6,084 |
12 |
Pipe ( |
Primary migration dataset. 30-day authentication window. |
Doctrina/02_Assets/PRJ-ISE-CHLA/authentication/
MSCHAPv2-to-Certificate-Migration/data/mschapv2_unique_devices.csv
3.2. Environment B: Personal ISE Lab (netapi-tui)
| File | Rows | Cols | Delimiter | Notes |
|---|---|---|---|---|
|
52 |
25 |
Comma |
Base endpoint inventory |
|
52 |
30 |
Comma |
25 base + 5 network cols (all |
|
12 |
3 |
Comma |
Pre-aggregated profile counts (derivative) |
|
29 |
25 |
Comma |
Filtered subset (>30 days inactive) |
|
25 |
25 |
Comma |
Filtered subset (profile == |
|
58 |
25 |
JSON |
Newer snapshot (6 additional endpoints vs CSV) |
netapi-tui/02_AUTOMATA/cisco/ise/python/dataconnect/
reports/endpoints.csv
reports/endpoints-with-network.csv
reports/profile-distribution.csv
reports/analysis_output/stale_endpoints.csv
reports/analysis_output/unknown_endpoints.csv
endpoints.json
|
No |
4. CHLA Production Schema
4.1. File Structure
The file is not standard CSV. It uses pipe (|) delimiters with fixed-width padding:
-
Line 1: Header row (pipe-delimited, padded)
-
Line 2: Separator (
---…---) -
Lines 3-6,086: Data rows (6,084 records)
-
Line 6,087: Blank
-
Line 6,088: Summary line (
"6084 rows")
Any parser must skip the separator line and the trailing summary.
4.2. Column Inventory
| # | Column | Data Type | Nullability |
|---|---|---|---|
1 |
|
String (XX:XX:XX:XX:XX:XX) |
Never null |
2 |
|
String |
Never null (varies: email, host/FQDN, service accounts) |
3 |
|
String (constant: |
Never null |
4 |
|
String (constant: |
Never null |
5 |
|
String (categorical) |
Contains |
6 |
|
String (categorical) |
Contains |
7 |
|
Integer |
Never null |
8 |
|
Timestamp (YYYY-MM-DD HH:MM:SS.ffffff) |
Never null |
9 |
|
Timestamp (YYYY-MM-DD HH:MM:SS.ffffff) |
Never null |
10 |
|
String (semicolon-separated IPs) |
Never null |
11 |
|
Integer |
Never null |
12 |
|
Integer |
Never null |
4.3. Constant Columns (Zero Analytic Value)
AUTH_METHOD and AUTH_PROTOCOL are identical across all 6,084 rows:
-
AUTH_METHOD=dot1x(100%) -
AUTH_PROTOCOL=PEAP (EAP-MSCHAPv2)(100%)
These exist because the dataset was pre-filtered from ISE DataConnect. They confirm the extraction criteria but provide no differentiation for analysis.
4.4. DEVICE_TYPE Distribution (29 Distinct Values)
| Value | Count | Category |
|---|---|---|
|
1,304 |
IoMT — iPhone |
|
957 |
IoMT — Chromebook |
|
857 |
IoMT — Dell thin client |
|
797 |
ISE native — Chromebook |
|
460 |
IoMT — Apple clinical |
|
400 |
ISE native — Linux |
|
331 |
ISE native — macOS |
|
275 |
Unclassified by ISE profiler |
|
270 |
ISE native — Windows 10 domain |
(empty) |
142 |
No device type assigned |
|
98 |
ISE native — Windows generic |
|
45 |
ISE native — Windows 8 |
|
38 |
ISE native — Android generic |
|
30 |
ISE native — Windows 11 domain |
|
14 |
ISE native — Windows 7 domain (legacy) |
|
9 |
ISE native — Windows 7 |
|
6 |
ISE native — Samsung Android |
|
6 |
ISE native — macOS El Capitan (legacy) |
|
5 |
ISE native — WYSE thin client |
|
5 |
ISE native — Intel |
|
5 |
IoMT — Natus EEG (biomedical) |
|
5 |
ISE native — macOS Mojave |
|
5 |
ISE native — Windows XP (critical legacy) |
|
3 |
IoMT — Swisslog med dispenser |
|
3 |
ISE native — Nortel device |
|
3 |
ISE native — iPhone |
|
3 |
ISE native — Apple generic |
|
2 |
IoMT — Cisco access point |
|
2 |
ISE native — Samsung tablet |
6 additional types (1 each) |
6 |
Single-instance device types |
|
Three categories of device types exist:
The Medigate-profiled devices represent 59% of all endpoints — more than half the migration population is classified by the clinical IoMT platform, not by ISE’s native profiler. This has significant implications for migration wave planning. |
4.5. IDENTITY_GROUP Distribution
| Value | Observation |
|---|---|
|
Unclassified (correlates with DEVICE_TYPE= |
|
General workstation classification |
|
Windows domain machines |
|
WYSE thin client group |
|
Android devices |
|
Medigate Chromebook group |
|
Medigate iPhone group |
|
Medigate thin client group |
|
Medigate Apple clinical group |
4.6. Numeric Field Ranges
| Field | Min | Max | Notes |
|---|---|---|---|
|
1 |
7,394 |
Data sorted descending by this column |
|
0 |
7,394 |
Devices with 0 successes are pure failures |
|
0 |
~high |
Devices with DEVICE_TYPE= |
4.7. Timestamp Ranges
| Field | Earliest | Latest |
|---|---|---|
|
~2025-10-17 |
~2025-11-16 |
|
~2025-10-16 |
~2025-11-16 |
The dataset captures approximately 30 days of authentication activity.
4.8. SWITCHES Column (Multi-Valued)
The SWITCHES column contains network access device IPs. Some devices authenticate through multiple switches, represented as semicolon-separated values:
10.192.145.20 # Single switch
10.192.145.20; 10.217.120.5 # Multi-switch
This indicates device mobility between network segments during the observation window.
4.9. LAST_USERNAME Patterns
Three distinct username patterns exist:
| Pattern | Example | Interpretation |
|---|---|---|
Email (user@domain) |
|
Human user authentication |
Machine account (host/FQDN) |
|
Machine authentication (domain-joined) |
Service account |
|
Shared service credentials |
4.10. Sample Data (First 5 Rows)
| MAC | USERNAME | DEVICE_TYPE | IDENTITY_GROUP | AUTH | SUCCESS | FAIL |
|---|---|---|---|---|---|---|
F4:B3:01:A3:AF:39 |
svc_dellzeroclient |
WYSE-Device |
CHLA_THIN_CLIENT |
7,394 |
7,394 |
0 |
14:E0:1D:07:52:58 |
Linux-Workstation |
Workstation |
4,612 |
4,588 |
24 |
|
DC:46:28:50:CA:17 |
svc_chromeosclient |
ChromeBook-Workstation |
Workstation |
3,865 |
3,865 |
0 |
DC:46:28:50:CA:62 |
svc_chromeosclient |
Linux-Workstation |
Workstation |
3,859 |
3,859 |
0 |
28:A0:6B:2C:21:44 |
svc_chromeosclient |
medigate-…chromebook |
medigate-…chromebook |
3,732 |
3,732 |
0 |
5. ISE Lab Schema (Personal Lab)
5.1. endpoints.csv (Primary)
52 endpoints, 25 columns. Standard comma-delimited CSV.
| # | Column | Data Type | Nullability |
|---|---|---|---|
1 |
|
String (XX:XX:XX:XX:XX:XX) |
Never null |
2 |
|
String |
Mostly |
3 |
|
String |
Mostly |
4 |
|
String (categorical) |
Never null |
5 |
|
Integer (0-70) |
Never null |
6 |
|
UUID |
Never null |
7 |
|
String (categorical) |
Never null |
8 |
|
String |
Always |
9 |
|
String |
Always |
10 |
|
String |
Always |
11 |
|
String |
Always |
12 |
|
String |
Always |
13 |
|
String |
Always |
14 |
|
String |
Always |
15 |
|
String |
Always |
16 |
|
Boolean |
Always |
17 |
|
Boolean |
Always |
18 |
|
String |
Always |
19 |
|
String |
Always |
20 |
|
String |
Always |
21 |
|
String |
Always |
22 |
|
Timestamp (YYYY-MM-DD HH:MM:SS) |
Never null |
23 |
|
Timestamp (YYYY-MM-DD HH:MM:SS) |
Never null |
24 |
|
UUID |
Never null |
25 |
|
String (epid:NNN) |
Never null |
5.2. Columns That Are Always None
Eight columns are structurally present but entirely unpopulated:
-
manufacturer,model,os_version,firmware,hardware_model,vendor_type,mdm_server,mdm_guid
These represent ISE profiling capabilities that are not exercised in the lab environment (no DHCP/HTTP/NMAP probes configured, no MDM integration).
5.3. endpoint_profile Distribution
| Profile | Count | Certainty |
|---|---|---|
Unknown |
25 |
0 |
Linux-Workstation |
7 |
20 |
Android-Samsung |
4 |
70 |
Apple-Device |
3 |
10 |
Intel-Device |
3 |
10 |
Synology-Device |
2 |
10-70 |
Apple-MacOS |
2 |
20 |
Android |
2 |
10 |
CyberPower-System-Device |
1 |
70 |
Raspberry-Pi-Endpoint |
1 |
70 |
Samsung-Device |
1 |
10 |
Epson-Device |
1 |
10 |
5.4. identity_group Distribution
| Group | Count |
|---|---|
Unknown |
25 |
Workstation |
8 |
Android |
7 |
Profiled |
6 |
Apple |
5 |
STORAGE |
1 |
5.5. Timestamp Ranges
| Field | Earliest | Latest |
|---|---|---|
|
2025-10-06 |
2025-12-03 |
|
2025-10-06 |
2025-12-03 |
Approximately 59 days of profiling activity.
5.6. endpoints-with-network.csv
Same 52 endpoints with 5 additional network columns appended:
| # | Column | Status |
|---|---|---|
26 |
|
Always |
27 |
|
Always |
28 |
|
Always |
29 |
|
Always |
30 |
|
Always |
All 5 network columns are entirely unpopulated. The RADIUS accounting data that populates these fields is not flowing in the lab environment.
5.7. endpoints.json (Newer Snapshot)
58 endpoints (6 more than CSV), captured through 2025-12-23. Same 25-field schema.
Key differences from CSV:
-
Uses JSON
nullinstead of string"None" -
Contains 6 additional endpoints not in CSV, including profiles:
Synology-Device,Cyber-Power-System-Device -
New identity group:
STORAGE -
first_seen/last_seenrange extends to 2025-12-23
5.8. Derivative Files
5.8.1. profile-distribution.csv
Pre-aggregated 12-row summary:
| Column | Type | Description |
|---|---|---|
|
String |
Profile name |
|
Integer |
Number of endpoints |
|
Float |
Percentage of total |
This is computed from endpoints.csv — not a raw data source.
5.8.2. stale_endpoints.csv
29 endpoints filtered by inactivity threshold. Same 25-column schema as endpoints.csv but exported by Python with empty strings ("") instead of None.
5.8.3. unknown_endpoints.csv
25 endpoints where endpoint_profile == Unknown and certainty_factor == 0. Same 25-column schema.
6. Cross-Dataset Architecture
6.1. Schema Comparison
The two ISE environments serve different operational purposes, reflected in their schemas:
| Dimension | CHLA Production | ISE Lab |
|---|---|---|
Primary purpose |
Authentication tracking |
Endpoint profiling |
Record count |
6,084 |
52-58 |
Time window |
~30 days |
~59 days |
Key identifier |
|
|
Authentication data |
AUTH_COUNT, SUCCESS_COUNT, FAIL_COUNT, AUTH_METHOD, AUTH_PROTOCOL |
(not captured) |
Profiling data |
DEVICE_TYPE, IDENTITY_GROUP |
endpoint_profile, certainty_factor, identity_group, manufacturer, model, os_version, firmware, hardware_model |
Network context |
SWITCHES (populated, multi-valued) |
switch_ip, switch_name, switch_port (all |
User identity |
LAST_USERNAME (email, host/, svc_) |
(not captured) |
Endpoint metadata |
(not captured) |
registration_status, byod_status, static_profile, static_group, posture_applicable, endpoint_guid, endpoint_id |
MDM integration |
(not captured) |
mdm_server, mdm_guid (both |
6.2. Theoretical Join Key
Both datasets use MAC address as the primary identifier:
-
CHLA:
MAC_ADDRESS(uppercase, colon-separated:F4:B3:01:A3:AF:39) -
Lab:
mac_address(uppercase, colon-separated:90:32:4B:B8:FC:FE)
Format is identical (no normalization needed for join). However, zero overlap is expected — these are physically separate ISE deployments monitoring different networks.
6.3. Unified Schema for Analytics
A normalized schema that accommodates both datasets:
| # | Unified Column | CHLA Source | Lab Source |
|---|---|---|---|
1 |
|
|
|
2 |
|
|
|
3 |
|
|
|
4 |
|
|
|
5 |
|
|
(null) |
6 |
|
|
(null) |
7 |
|
|
(null) |
8 |
|
|
(null) |
9 |
|
|
(null) |
10 |
|
|
(null) |
11 |
|
(null) |
|
12 |
|
|
|
13 |
|
|
|
14 |
|
|
(null) |
15 |
|
(null) |
|
16 |
|
(null) |
|
17 |
|
(null) |
|
18 |
|
(null) |
|
19 |
|
(null) |
|
20 |
|
(null) |
|
6.4. Key Analytical Insights
| Signal | Implication |
|---|---|
DEVICE_TYPE = |
These are failed authentication attempts from unclassified devices. Not migration candidates — they need ISE profiling remediation first. |
Devices with SUCCESS_COUNT > 0 and FAIL_COUNT = 0 |
Healthy authenticators. These are the easiest migration candidates (known-good, stable authentication). |
Service accounts ( |
Shared credentials across many devices. Migration requires per-device certificate issuance to replace shared passwords. |
Multi-switch devices (semicolons in SWITCHES) |
Mobile endpoints. Need to verify certificate works across switch boundaries. |
|
Domain-joined machines. EAP-TEAP migration path (machine + user auth in single tunnel). |
| Signal | Implication |
|---|---|
48% Unknown profile (certainty_factor = 0) |
ISE profiling probes are not configured (no DHCP, HTTP, NMAP). Not a data quality issue — reflects lab configuration. |
8 columns always |
Same as above. These fields require profiling probe data that the lab does not collect. |
5 network columns always |
RADIUS accounting not flowing. The lab authenticates but does not track network session details. |
7. Next Steps
| Phase | Action | Status |
|---|---|---|
Pipeline |
Build Python normalization script ( |
|
Statistical Analysis |
Drill into None/None devices, multi-switch endpoints, fail-heavy records, and time-based authentication patterns |
Planned |
Migration Reports |
Generate per-wave readiness scores, stale device identification, and actionable AsciiDoc reports per migration wave |
Planned |
Dashboards |
Visualize device type distributions, auth success/fail ratios, migration progress, and trends |
Planned |
8. Metadata
| Field | Value |
|---|---|
Document ID |
DOC-2026-04-08-data-shape-analysis |
Author |
Evan Rosado |
Status |
Final |
Analysis Date |
2026-04-08 |
Last Updated |
2026-04-08 |
Environments |
CHLA Production, ISE Lab (ise-01.inside.domusdigitalis.dev) |