XML Extraction
Many enterprise APIs still return XML — SOAP services, Cisco ISE MnT, older VMware APIs, SAML metadata endpoints. Three tools handle the common cases: xmlstarlet for XPath queries, yq for XML-to-JSON conversion, and grep for quick-and-dirty extraction when you need a single value fast.
xmlstarlet: XPath queries
xmlstarlet is the XML equivalent of jq.
The sel (select) subcommand with -t -v (template, value) extracts text by XPath.
# Extract all values at an XPath
curl -s "$API_URL/resource" \
| xmlstarlet sel -t -v "//name" -n
# Extract an attribute
curl -s "$API_URL/devices" \
| xmlstarlet sel -t -v "//device/@id" -n
# Multiple values per record (tab-separated)
curl -s "$API_URL/endpoints" \
| xmlstarlet sel -t -m "//endpoint" \
-v "mac" -o $'\t' -v "ip" -o $'\t' -v "profile" -n
The flags:
-t
|
Start a template |
-v "xpath"
|
Output the value at that XPath |
-m "xpath"
|
Match (loop over) elements |
-o "text"
|
Output literal text (delimiters) |
-n
|
Output a newline |
Namespace handling
Many enterprise XML responses declare namespaces. Without registering them, XPath queries silently return nothing.
# Register a namespace, then query
curl -s "$API_URL/soap-endpoint" \
| xmlstarlet sel \
-N ns="http://schemas.example.com/api/v1" \
-t -v "//ns:result/ns:status" -n
The -N prefix=uri flag binds a prefix you choose to the namespace URI from the response.
Inspect the root element’s xmlns attribute to find the URI.
ISE MnT example
The ISE Monitoring (MnT) API returns XML. A typical active sessions response:
# Get active session count from ISE MnT
curl -sk -u "$ISE_USER:$ISE_PASS" \
"https://$ISE_HOST/admin/API/mnt/Session/ActiveCount" \
| xmlstarlet sel -t -v "//count" -n
# Extract session details
curl -sk -u "$ISE_USER:$ISE_PASS" \
"https://$ISE_HOST/admin/API/mnt/Session/ActiveList" \
| xmlstarlet sel -t -m "//activeSession" \
-v "user_name" -o $'\t' \
-v "nas_ip_address" -o $'\t' \
-v "calling_station_id" -n \
| column -t -s $'\t'
yq: XML-to-JSON conversion
yq (the Go version by Mike Farah) converts XML to JSON, letting you use jq for the actual extraction. This is often simpler than learning XPath for one-off tasks.
# Convert XML response to JSON, then use jq
curl -s "$API_URL/resource.xml" \
| yq -p xml -o json \
| jq '.root.items[].name'
# Inline pipeline: XML API -> JSON -> extract
curl -sk -u "$USER:$PASS" "https://$HOST/api/data" \
| yq -p xml -o json \
| jq -r '.response.records[] | [.id, .name, .status] | @tsv' \
| column -t -s $'\t'
|
There are two tools named |
Conversion caveats
XML-to-JSON conversion is lossy. Watch for these:
-
XML attributes become
+@attribute_namekeys in JSON -
Repeated elements may or may not become arrays (depends on count)
-
Mixed content (text + child elements) maps unpredictably
-
Namespace prefixes appear in key names
For critical automation, use xmlstarlet with explicit XPath. For exploration and ad-hoc queries, yq-to-jq is faster to iterate.
grep -oP: quick extraction
When you need a single value from a known XML structure and do not want to install xmlstarlet, PCRE grep works.
# Extract text between tags
curl -s "$API_URL/status" \
| grep -oP '(?<=<version>)[^<]+'
# Extract an attribute value
curl -s "$API_URL/info" \
| grep -oP 'id="\K[^"]+'
(?⇐<version>) is a lookbehind — it matches the position after <version> without including it.
\K resets the match start, achieving the same effect inside -oP.
This is fragile. It breaks on multiline tags, attributes in unexpected order, or CDATA sections. Use it for quick checks, not production scripts.
SOAP envelope handling
SOAP APIs wrap responses in an envelope.
The actual data lives inside <soap:Body>.
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Header/>
<soap:Body>
<GetDeviceResponse xmlns="http://example.com/api">
<device>
<name>switch-core-01</name>
<ip>10.50.1.10</ip>
</device>
</GetDeviceResponse>
</soap:Body>
</soap:Envelope>
# Strip the SOAP envelope and extract data
curl -s -X POST "$SOAP_ENDPOINT" \
-H "Content-Type: text/xml" \
-d @request.xml \
| xmlstarlet sel \
-N soap="http://schemas.xmlsoap.org/soap/envelope/" \
-N api="http://example.com/api" \
-t -v "//api:device/api:name" -n
# Or convert the whole body to JSON and use jq
curl -s -X POST "$SOAP_ENDPOINT" \
-H "Content-Type: text/xml" \
-d @request.xml \
| xmlstarlet sel -t -c "//soap:Body/*" \
| yq -p xml -o json \
| jq '.'
Tool decision matrix
| Tool | Best for | Install |
|---|---|---|
xmlstarlet |
Production scripts, namespace-heavy XML, precise XPath |
|
yq -p xml |
Quick exploration, converting to JSON for jq pipelines |
|
grep -oP |
One-off value extraction from simple XML |
Built-in |
When the response is XML and your downstream tool expects JSON, convert early. When precision matters and the XML is complex, stay in XPath.