JSON Extraction with jq

jq is the standard command-line JSON processor. Every API that returns JSON — GitHub, Cloudflare, Cisco ISE ERS, AWS, Kubernetes — can be parsed with the same jq patterns.

Common API response structures

APIs wrap data in different envelopes. Recognizing the structure tells you what jq expression to start with.

Structure Example jq entry point

{"data": […​]}

GitHub GraphQL, many REST APIs

.data[]

{"results": […​]}

Django REST Framework, Meraki

.results[]

{"items": […​]}

Kubernetes, Google APIs

.items[]

{"SearchResult": {"resources": […​]}}

Cisco ISE ERS

.SearchResult.resources[]

Bare array […​]

Simple endpoints, some microservices

.[]

Basic extraction

Extract a single field from each object in a collection:

# List all repository names from GitHub
curl -s https://api.github.com/users/octocat/repos \
  | jq -r '.[].name'

# Get specific fields from a nested response
curl -s https://api.github.com/repos/jqlang/jq \
  | jq '{name: .name, stars: .stargazers_count, language: .language}'

The -r flag outputs raw strings (no quotes). Without it, jq wraps string values in double quotes.

Nested extraction and array flattening

APIs frequently nest arrays inside arrays. jq handles this without temporary variables.

# Flatten nested tags from a list of resources
curl -s "$API_URL/resources" \
  | jq '[.data[].tags[]] | unique'

# Extract from deeply nested structures
curl -s "$API_URL/network/devices" \
  | jq '.data[].interfaces[].ipv4.addresses[].address'

# Collect nested fields into flat objects
curl -s "$API_URL/users" \
  | jq '.data[] | {name: .name, city: .address.city, company: .company.name}'

Conditional extraction with select()

select() filters objects based on a condition. This is the jq equivalent of WHERE in SQL.

# Only active users
curl -s "$API_URL/users" \
  | jq '.data[] | select(.status == "active")'

# Endpoints with a specific OS
curl -s "$API_URL/endpoints" \
  | jq '.items[] | select(.os | startswith("Windows"))'

# Numeric comparisons
curl -s "$API_URL/servers" \
  | jq '.results[] | select(.cpu_usage > 80) | {name, cpu_usage}'

# Null-safe filtering (avoid errors on missing fields)
curl -s "$API_URL/devices" \
  | jq '.data[] | select(.firmware != null) | select(.firmware | test("^2\\."))'

test() takes a regex. Escape backslashes once for jq, once for the shell — or use single-quoted jq expressions to avoid the shell layer.

Transforming with map()

map() applies an expression to every element in an array. It is the functional equivalent of a for loop.

# Transform each object, keeping only what you need
curl -s "$API_URL/endpoints" \
  | jq '[.data | map({id: .id, mac: .mac, profile: .profileName})]'

# Add computed fields
curl -s "$API_URL/disks" \
  | jq '.items | map(. + {usage_pct: ((.used / .total) * 100 | round)})'

# Combine select and map
curl -s "$API_URL/alerts" \
  | jq '[.data | map(select(.severity == "critical")) | map({id, message, timestamp})]'

Output formatting

Tab-separated (for column, awk, cut)

# TSV output -- pipe to column for aligned display
curl -s "$API_URL/users" \
  | jq -r '.data[] | [.name, .email, .role] | @tsv' \
  | column -t -s $'\t'

CSV output

# CSV with header
curl -s "$API_URL/endpoints" \
  | jq -r '["MAC","IP","Profile"], (.data[] | [.mac, .ip, .profileName]) | @csv'

Custom format strings

# Formatted output with string interpolation
curl -s "$API_URL/nodes" \
  | jq -r '.items[] | "\(.name)\t\(.status)\t\(.ip)"'

Chaining with pipes

jq expressions chain with | the same way shell commands do. Each pipe passes its output to the next filter.

# Multi-stage pipeline inside jq
curl -s "$API_URL/events" \
  | jq '
    .data
    | map(select(.level == "error"))
    | sort_by(.timestamp)
    | reverse
    | .[0:10]
    | map({timestamp, source, message})
  '

This reads as: take the data array, keep only errors, sort by time, reverse for newest-first, take the top 10, and extract three fields.

Combining jq with shell pipelines

jq handles the JSON layer. Shell tools handle everything else.

# Count unique values
curl -s "$API_URL/endpoints" \
  | jq -r '.data[].profileName' \
  | sort | uniq -c | sort -rn

# Feed jq output into a loop
curl -s "$API_URL/devices" \
  | jq -r '.data[].id' \
  | while read -r device_id; do
      curl -s "$API_URL/devices/${device_id}/config" > "config-${device_id}.json"
    done

# Diff two API responses
diff <(curl -s "$URL1" | jq -S .) <(curl -s "$URL2" | jq -S .)

jq -S sorts keys, making diffs meaningful. Process substitution (<(…​)) avoids temporary files.

Handling errors in jq

# Provide defaults for missing keys
curl -s "$API_URL/items" \
  | jq '.data[] | {name: .name, version: (.version // "unknown")}'

# The try-catch equivalent
curl -s "$API_URL/items" \
  | jq '.data[] | {name, count: (try (.items | length) catch 0)}'

# Check if a response is an error before processing
response=$(curl -s "$API_URL/resource")
if echo "$response" | jq -e '.error' > /dev/null 2>&1; then
  echo "API error: $(echo "$response" | jq -r '.error.message')" >&2
  exit 1
fi

The -e flag sets the exit code based on the jq output. null and false produce exit code 1; everything else produces exit code 0.