MSCHAPv2 Migration: Data Shape Analysis

Complete data shape analysis of 7 data files across two ISE environments (CHLA production and personal lab). This document establishes the data foundation for migration planning, analytics dashboards, and statistical analysis.

1. Context

The MSCHAPv2 to EAP-TLS/EAP-TEAP migration affects 6,084 unique endpoints in CHLA production. Before building analytics tooling, migration reports, or dashboards, we need a thorough understanding of the available data: schemas, data types, value distributions, nullability patterns, and cross-dataset relationships.

Two ISE environments provide data:

  1. CHLA Production — 2484-sourced DataConnect export of all MSCHAPv2-authenticated endpoints over a 30-day window

  2. Personal ISE Lab (ise-01.inside.domusdigitalis.dev) — Endpoint inventory from ISE DataConnect, used for tooling development and profiling experimentation

2. Objectives

  • Count total rows and identify all columns for every data file

  • Determine data types, nullability, and sample values per column

  • Count unique values for categorical fields

  • Find min/max for numeric fields

  • Identify date ranges for timestamp fields

  • Map relationships between datasets

  • Define a unified schema for cross-environment analytics

  • Identify migration readiness signals in the CHLA data

3. Data Inventory

Seven data files across two ISE environments were analyzed. No files were modified during this assessment.

3.1. Environment A: CHLA Production (MSCHAPv2 Migration)

File Rows Cols Delimiter Notes

mschapv2_unique_devices.csv

6,084

12

Pipe (|) with fixed-width padding

Primary migration dataset. 30-day authentication window.

Source location
Doctrina/02_Assets/PRJ-ISE-CHLA/authentication/
  MSCHAPv2-to-Certificate-Migration/data/mschapv2_unique_devices.csv

3.2. Environment B: Personal ISE Lab (netapi-tui)

File Rows Cols Delimiter Notes

endpoints.csv

52

25

Comma

Base endpoint inventory

endpoints-with-network.csv

52

30

Comma

25 base + 5 network cols (all None)

profile-distribution.csv

12

3

Comma

Pre-aggregated profile counts (derivative)

stale_endpoints.csv

29

25

Comma

Filtered subset (>30 days inactive)

unknown_endpoints.csv

25

25

Comma

Filtered subset (profile == Unknown)

endpoints.json

58

25

JSON

Newer snapshot (6 additional endpoints vs CSV)

Source location
netapi-tui/02_AUTOMATA/cisco/ise/python/dataconnect/
  reports/endpoints.csv
  reports/endpoints-with-network.csv
  reports/profile-distribution.csv
  reports/analysis_output/stale_endpoints.csv
  reports/analysis_output/unknown_endpoints.csv
  endpoints.json

No .xlsx files were found in either environment. The recursive search covered all subdirectories of both project trees.

4. CHLA Production Schema

4.1. File Structure

The file is not standard CSV. It uses pipe (|) delimiters with fixed-width padding:

  • Line 1: Header row (pipe-delimited, padded)

  • Line 2: Separator (---…​---)

  • Lines 3-6,086: Data rows (6,084 records)

  • Line 6,087: Blank

  • Line 6,088: Summary line ("6084 rows")

Any parser must skip the separator line and the trailing summary.

4.2. Column Inventory

# Column Data Type Nullability

1

MAC_ADDRESS

String (XX:XX:XX:XX:XX:XX)

Never null

2

LAST_USERNAME

String

Never null (varies: email, host/FQDN, service accounts)

3

AUTH_METHOD

String (constant: dot1x)

Never null

4

AUTH_PROTOCOL

String (constant: PEAP (EAP-MSCHAPv2))

Never null

5

DEVICE_TYPE

String (categorical)

Contains None values

6

IDENTITY_GROUP

String (categorical)

Contains None values

7

AUTH_COUNT

Integer

Never null

8

FIRST_SEEN

Timestamp (YYYY-MM-DD HH:MM:SS.ffffff)

Never null

9

LAST_SEEN

Timestamp (YYYY-MM-DD HH:MM:SS.ffffff)

Never null

10

SWITCHES

String (semicolon-separated IPs)

Never null

11

SUCCESS_COUNT

Integer

Never null

12

FAIL_COUNT

Integer

Never null

4.3. Constant Columns (Zero Analytic Value)

AUTH_METHOD and AUTH_PROTOCOL are identical across all 6,084 rows:

  • AUTH_METHOD = dot1x (100%)

  • AUTH_PROTOCOL = PEAP (EAP-MSCHAPv2) (100%)

These exist because the dataset was pre-filtered from ISE DataConnect. They confirm the extraction criteria but provide no differentiation for analysis.

4.4. DEVICE_TYPE Distribution (29 Distinct Values)

Value Count Category

medigate-smartphone-apple-iphone-policy

1,304

IoMT — iPhone

medigate-laptop-google-chromebook-policy

957

IoMT — Chromebook

medigate-thinclient-dell-na-policy

857

IoMT — Dell thin client

ChromeBook-Workstation

797

ISE native — Chromebook

medigate_Apple_Clinical_Mobile_Device

460

IoMT — Apple clinical

Linux-Workstation

400

ISE native — Linux

OS_X-Workstation

331

ISE native — macOS

Unknown

275

Unclassified by ISE profiler

Win10_Domain_Joined

270

ISE native — Windows 10 domain

(empty)

142

No device type assigned

Microsoft-Workstation

98

ISE native — Windows generic

Windows8-Workstation

45

ISE native — Windows 8

Android

38

ISE native — Android generic

Win11_Domain_Joined

30

ISE native — Windows 11 domain

Win7_Domain_Joined

14

ISE native — Windows 7 domain (legacy)

Windows7-Workstation

9

ISE native — Windows 7

Android-Samsung-Galaxy-Phone

6

ISE native — Samsung Android

OS_X_El_Capitan-Workstation

6

ISE native — macOS El Capitan (legacy)

WYSE-Device

5

ISE native — WYSE thin client

Intel-Device

5

ISE native — Intel

medigate_Xltek_Neuroworks_Workstation_Natus_EEG

5

IoMT — Natus EEG (biomedical)

OS_X_Mojave-Workstation

5

ISE native — macOS Mojave

WindowsXP-Workstation

5

ISE native — Windows XP (critical legacy)

medigate_AutoCarousel_Swisslog_Medication_Dispensing_System

3

IoMT — Swisslog med dispenser

Nortel-Device

3

ISE native — Nortel device

Apple-iPhone

3

ISE native — iPhone

Apple-Device

3

ISE native — Apple generic

medigate-accesspoint-cisco-na-policy

2

IoMT — Cisco access point

Android-Samsung-Galaxy-Tablet

2

ISE native — Samsung tablet

6 additional types (1 each)

6

Single-instance device types

Three categories of device types exist:

  1. ISE native profiling (17 types, ~2,397 endpoints) — Standard Cisco ISE profiler classifications

  2. Medigate IoMT policies (8 types, ~3,589 endpoints) — Internet of Medical Things profiles from Medigate clinical platform integration

  3. Unclassified (2 types: Unknown + empty, ~417 endpoints) — Devices ISE could not profile

The Medigate-profiled devices represent 59% of all endpoints — more than half the migration population is classified by the clinical IoMT platform, not by ISE’s native profiler. This has significant implications for migration wave planning.

4.5. IDENTITY_GROUP Distribution

Value Observation

None

Unclassified (correlates with DEVICE_TYPE=None)

Workstation

General workstation classification

Win10_Domain_Joined

Windows domain machines

CHLA_THIN_CLIENT

WYSE thin client group

Android

Android devices

medigate-laptop-google-chromebook-policy

Medigate Chromebook group

medigate-phone-apple-iphone-policy

Medigate iPhone group

medigate-thin-client-dell-thinclient-policy

Medigate thin client group

medigate-laptop-apple-clinical-policy

Medigate Apple clinical group

4.6. Numeric Field Ranges

Field Min Max Notes

AUTH_COUNT

1

7,394

Data sorted descending by this column

SUCCESS_COUNT

0

7,394

Devices with 0 successes are pure failures

FAIL_COUNT

0

~high

Devices with DEVICE_TYPE=None tend to have high fail counts

4.7. Timestamp Ranges

Field Earliest Latest

FIRST_SEEN

~2025-10-17

~2025-11-16

LAST_SEEN

~2025-10-16

~2025-11-16

The dataset captures approximately 30 days of authentication activity.

4.8. SWITCHES Column (Multi-Valued)

The SWITCHES column contains network access device IPs. Some devices authenticate through multiple switches, represented as semicolon-separated values:

10.192.145.20                          # Single switch
10.192.145.20; 10.217.120.5           # Multi-switch

This indicates device mobility between network segments during the observation window.

4.9. LAST_USERNAME Patterns

Three distinct username patterns exist:

Pattern Example Interpretation

Email (user@domain)

momendez@chla.usc.edu

Human user authentication

Machine account (host/FQDN)

host/D-MJ08AY3A.LA.AD.CHLA.ORG

Machine authentication (domain-joined)

Service account

svc_chromeosclient, svc_dellzeroclient

Shared service credentials

4.10. Sample Data (First 5 Rows)

MAC USERNAME DEVICE_TYPE IDENTITY_GROUP AUTH SUCCESS FAIL

F4:B3:01:A3:AF:39

svc_dellzeroclient

WYSE-Device

CHLA_THIN_CLIENT

7,394

7,394

0

14:E0:1D:07:52:58

momendez@chla.usc.edu

Linux-Workstation

Workstation

4,612

4,588

24

DC:46:28:50:CA:17

svc_chromeosclient

ChromeBook-Workstation

Workstation

3,865

3,865

0

DC:46:28:50:CA:62

svc_chromeosclient

Linux-Workstation

Workstation

3,859

3,859

0

28:A0:6B:2C:21:44

svc_chromeosclient

medigate-…​chromebook

medigate-…​chromebook

3,732

3,732

0

5. ISE Lab Schema (Personal Lab)

5.1. endpoints.csv (Primary)

52 endpoints, 25 columns. Standard comma-delimited CSV.

# Column Data Type Nullability

1

mac_address

String (XX:XX:XX:XX:XX:XX)

Never null

2

hostname

String

Mostly None

3

ip_address

String

Mostly None

4

endpoint_profile

String (categorical)

Never null

5

certainty_factor

Integer (0-70)

Never null

6

identity_group_id

UUID

Never null

7

identity_group

String (categorical)

Never null

8

manufacturer

String

Always None

9

model

String

Always None

10

os_version

String

Always None

11

firmware

String

Always None

12

hardware_model

String

Always None

13

vendor_type

String

Always None

14

registration_status

String

Always Not Registered

15

byod_status

String

Always Unknown

16

static_profile

Boolean

Always false

17

static_group

Boolean

Always false

18

posture_applicable

String

Always Yes

19

mdm_server

String

Always None

20

mdm_guid

String

Always None

21

profiler_server

String

Always ise-01.inside.domusdigitalis.dev

22

first_seen

Timestamp (YYYY-MM-DD HH:MM:SS)

Never null

23

last_seen

Timestamp (YYYY-MM-DD HH:MM:SS)

Never null

24

endpoint_guid

UUID

Never null

25

endpoint_id

String (epid:NNN)

Never null

5.2. Columns That Are Always None

Eight columns are structurally present but entirely unpopulated:

  • manufacturer, model, os_version, firmware, hardware_model, vendor_type, mdm_server, mdm_guid

These represent ISE profiling capabilities that are not exercised in the lab environment (no DHCP/HTTP/NMAP probes configured, no MDM integration).

5.3. endpoint_profile Distribution

Profile Count Certainty

Unknown

25

0

Linux-Workstation

7

20

Android-Samsung

4

70

Apple-Device

3

10

Intel-Device

3

10

Synology-Device

2

10-70

Apple-MacOS

2

20

Android

2

10

CyberPower-System-Device

1

70

Raspberry-Pi-Endpoint

1

70

Samsung-Device

1

10

Epson-Device

1

10

5.4. identity_group Distribution

Group Count

Unknown

25

Workstation

8

Android

7

Profiled

6

Apple

5

STORAGE

1

5.5. Timestamp Ranges

Field Earliest Latest

first_seen

2025-10-06

2025-12-03

last_seen

2025-10-06

2025-12-03

Approximately 59 days of profiling activity.

5.6. endpoints-with-network.csv

Same 52 endpoints with 5 additional network columns appended:

# Column Status

26

switch_ip

Always None

27

switch_name

Always None

28

switch_port

Always None

29

network_location

Always None

30

last_auth_time

Always None

All 5 network columns are entirely unpopulated. The RADIUS accounting data that populates these fields is not flowing in the lab environment.

5.7. endpoints.json (Newer Snapshot)

58 endpoints (6 more than CSV), captured through 2025-12-23. Same 25-field schema.

Key differences from CSV:

  • Uses JSON null instead of string "None"

  • Contains 6 additional endpoints not in CSV, including profiles: Synology-Device, Cyber-Power-System-Device

  • New identity group: STORAGE

  • first_seen/last_seen range extends to 2025-12-23

5.8. Derivative Files

5.8.1. profile-distribution.csv

Pre-aggregated 12-row summary:

Column Type Description

endpoint_profile

String

Profile name

count

Integer

Number of endpoints

percentage

Float

Percentage of total

This is computed from endpoints.csv — not a raw data source.

5.8.2. stale_endpoints.csv

29 endpoints filtered by inactivity threshold. Same 25-column schema as endpoints.csv but exported by Python with empty strings ("") instead of None.

5.8.3. unknown_endpoints.csv

25 endpoints where endpoint_profile == Unknown and certainty_factor == 0. Same 25-column schema.

6. Cross-Dataset Architecture

6.1. Schema Comparison

The two ISE environments serve different operational purposes, reflected in their schemas:

Dimension CHLA Production ISE Lab

Primary purpose

Authentication tracking

Endpoint profiling

Record count

6,084

52-58

Time window

~30 days

~59 days

Key identifier

MAC_ADDRESS

mac_address

Authentication data

AUTH_COUNT, SUCCESS_COUNT, FAIL_COUNT, AUTH_METHOD, AUTH_PROTOCOL

(not captured)

Profiling data

DEVICE_TYPE, IDENTITY_GROUP

endpoint_profile, certainty_factor, identity_group, manufacturer, model, os_version, firmware, hardware_model

Network context

SWITCHES (populated, multi-valued)

switch_ip, switch_name, switch_port (all None)

User identity

LAST_USERNAME (email, host/, svc_)

(not captured)

Endpoint metadata

(not captured)

registration_status, byod_status, static_profile, static_group, posture_applicable, endpoint_guid, endpoint_id

MDM integration

(not captured)

mdm_server, mdm_guid (both None)

6.2. Theoretical Join Key

Both datasets use MAC address as the primary identifier:

  • CHLA: MAC_ADDRESS (uppercase, colon-separated: F4:B3:01:A3:AF:39)

  • Lab: mac_address (uppercase, colon-separated: 90:32:4B:B8:FC:FE)

Format is identical (no normalization needed for join). However, zero overlap is expected — these are physically separate ISE deployments monitoring different networks.

6.3. Unified Schema for Analytics

A normalized schema that accommodates both datasets:

# Unified Column CHLA Source Lab Source

1

mac_address

MAC_ADDRESS

mac_address

2

environment

"chla_production" (injected)

"ise_lab" (injected)

3

device_type

DEVICE_TYPE

endpoint_profile

4

identity_group

IDENTITY_GROUP

identity_group

5

username

LAST_USERNAME

(null)

6

auth_count

AUTH_COUNT

(null)

7

success_count

SUCCESS_COUNT

(null)

8

fail_count

FAIL_COUNT

(null)

9

auth_method

AUTH_METHOD

(null)

10

auth_protocol

AUTH_PROTOCOL

(null)

11

certainty_factor

(null)

certainty_factor

12

first_seen

FIRST_SEEN

first_seen

13

last_seen

LAST_SEEN

last_seen

14

switches

SWITCHES

(null)

15

hostname

(null)

hostname

16

ip_address

(null)

ip_address

17

profiler_server

(null)

profiler_server

18

registration_status

(null)

registration_status

19

byod_status

(null)

byod_status

20

endpoint_guid

(null)

endpoint_guid

6.4. Key Analytical Insights

Table 1. CHLA Production — Migration Readiness Signals
Signal Implication

DEVICE_TYPE = None with high FAIL_COUNT

These are failed authentication attempts from unclassified devices. Not migration candidates — they need ISE profiling remediation first.

Devices with SUCCESS_COUNT > 0 and FAIL_COUNT = 0

Healthy authenticators. These are the easiest migration candidates (known-good, stable authentication).

Service accounts (svc_chromeosclient, svc_dellzeroclient)

Shared credentials across many devices. Migration requires per-device certificate issuance to replace shared passwords.

Multi-switch devices (semicolons in SWITCHES)

Mobile endpoints. Need to verify certificate works across switch boundaries.

host/ machine accounts

Domain-joined machines. EAP-TEAP migration path (machine + user auth in single tunnel).

Table 2. ISE Lab — Profiling Gaps
Signal Implication

48% Unknown profile (certainty_factor = 0)

ISE profiling probes are not configured (no DHCP, HTTP, NMAP). Not a data quality issue — reflects lab configuration.

8 columns always None

Same as above. These fields require profiling probe data that the lab does not collect.

5 network columns always None

RADIUS accounting not flowing. The lab authenticates but does not track network session details.

7. Next Steps

Phase Action Status

Pipeline

Build Python normalization script (scripts/ise-data-normalize.py) to produce unified CSV output from both environments

Statistical Analysis

Drill into None/None devices, multi-switch endpoints, fail-heavy records, and time-based authentication patterns

Planned

Migration Reports

Generate per-wave readiness scores, stale device identification, and actionable AsciiDoc reports per migration wave

Planned

Dashboards

Visualize device type distributions, auth success/fail ratios, migration progress, and trends

Planned

8. Metadata

Field Value

Document ID

DOC-2026-04-08-data-shape-analysis

Author

Evan Rosado

Status

Final

Analysis Date

2026-04-08

Last Updated

2026-04-08

Environments

CHLA Production, ISE Lab (ise-01.inside.domusdigitalis.dev)