jq Pipelines: Data Transformation

Why jq?

APIs return JSON. jq transforms it:

# Without jq: wall of text
curl -s https://api.github.com/users/torvalds

# With jq: beautiful, filtered
curl -s https://api.github.com/users/torvalds | jq '{name, company, location}'

Core Syntax

Syntax Meaning Example

.

Identity (whole input)

jq '.'

.key

Object key

jq '.name'

.[]

Array iterator

jq '.users[]'

.[0]

Array index

jq '.[0]'

.[0:3]

Array slice

jq '.[0:3]'

|

Pipe to next filter

jq '.users[] | .name'

,

Multiple outputs

jq '.name, .email'

Practical Patterns

Extract Single Field

# Get user's email
curl -s https://jsonplaceholder.typicode.com/users/1 | jq -r '.email'
# Output: Sincere@april.biz

# -r = raw output (no quotes)

Extract Multiple Fields

# Create new object
curl -s https://jsonplaceholder.typicode.com/users/1 | jq '{name, email, city: .address.city}'
# Output: {"name": "Leanne Graham", "email": "Sincere@april.biz", "city": "Gwenborough"}

Process Array

# All users' names
curl -s https://jsonplaceholder.typicode.com/users | jq '.[].name'

# As array
curl -s https://jsonplaceholder.typicode.com/users | jq '[.[].name]'

# With index
curl -s https://jsonplaceholder.typicode.com/users | jq 'to_entries | .[] | "\(.key): \(.value.name)"'

Filter Array

# Select where condition
curl -s https://jsonplaceholder.typicode.com/users | jq '.[] | select(.address.city == "Gwenborough")'

# Multiple conditions
curl -s https://jsonplaceholder.typicode.com/users | jq '.[] | select(.id > 3 and .id < 7)'

# Contains
curl -s https://jsonplaceholder.typicode.com/users | jq '.[] | select(.email | contains("biz"))'

Transform Array

# Map to new structure
curl -s https://jsonplaceholder.typicode.com/users | jq '[.[] | {id, name, email}]'

# Add computed field
curl -s https://jsonplaceholder.typicode.com/users | jq '[.[] | . + {domain: (.email | split("@")[1])}]'

Sort and Limit

# Sort by field
curl -s https://jsonplaceholder.typicode.com/users | jq 'sort_by(.name)'

# Reverse sort
curl -s https://jsonplaceholder.typicode.com/users | jq 'sort_by(.name) | reverse'

# First 3
curl -s https://jsonplaceholder.typicode.com/users | jq '[:3]'

# Last 3
curl -s https://jsonplaceholder.typicode.com/users | jq '.[-3:]'

Group and Count

# Group by field
curl -s https://jsonplaceholder.typicode.com/posts | jq 'group_by(.userId)'

# Count per group
curl -s https://jsonplaceholder.typicode.com/posts | jq 'group_by(.userId) | .[] | {userId: .[0].userId, count: length}'

# Total count
curl -s https://jsonplaceholder.typicode.com/users | jq 'length'

String Operations

# Split
jq '.email | split("@")'

# Join
jq '.tags | join(", ")'

# Upper/lower
jq '.name | ascii_upcase'
jq '.name | ascii_downcase'

# Contains
jq 'select(.name | contains("John"))'

# Starts/ends with
jq 'select(.email | startswith("admin"))'
jq 'select(.email | endswith(".org"))'

# Regex match
jq 'select(.email | test("^[a-z]+@"))'

# Replace
jq '.name | gsub(" "; "_")'

Nested Data

# Deep access
curl -s https://jsonplaceholder.typicode.com/users/1 | jq '.address.geo.lat'

# Optional access (no error if missing)
jq '.config.settings.theme // "default"'

# Recursive descent (find all keys named "id")
jq '.. | .id? // empty'

Output Formats

# Raw string (no quotes)
jq -r '.name'

# Compact (one line)
jq -c '.'

# Tab-separated
jq -r '[.name, .email] | @tsv'

# CSV
jq -r '[.name, .email] | @csv'

# URI encoded
jq -r '.query | @uri'

# Shell-safe
jq -r '@sh'

Building Tables

# TSV table with headers
(echo "ID\tNAME\tEMAIL"; curl -s https://jsonplaceholder.typicode.com/users | jq -r '.[] | [.id, .name, .email] | @tsv') | column -t -s $'\t'

# Output:
# ID  NAME             EMAIL
# 1   Leanne Graham    Sincere@april.biz
# 2   Ervin Howell     Shanna@melissa.tv

Conditional Logic

# If-then-else
jq 'if .active then "enabled" else "disabled" end'

# Multiple conditions
jq 'if .status == "open" then "🟢"
    elif .status == "closed" then "🔴"
    else "🟡" end'

# Null handling
jq '.value // "N/A"'

Variables and Functions

# Define variable
jq --arg name "John" '.users[] | select(.name == $name)'

# From file
jq --slurpfile config config.json '. + $config[0]'

# Define function
jq 'def double: . * 2; .numbers | map(double)'

Real-World Pipeline

# GitHub: Get repos, filter, transform, sort, format
curl -s "https://api.github.com/users/torvalds/repos?per_page=100" | jq '
  [.[] | select(.fork == false)]           # Exclude forks
  | sort_by(.stargazers_count) | reverse   # Sort by stars desc
  | .[0:5]                                 # Top 5
  | .[] | {                                # Transform
      name,
      stars: .stargazers_count,
      url: .html_url,
      language
    }
'

jq + Shell Integration

# Read into shell variable
repo_count=$(curl -s https://api.github.com/users/torvalds | jq -r '.public_repos')

# Loop over results
curl -s https://jsonplaceholder.typicode.com/users | jq -r '.[].email' | while read email; do
  echo "Processing: $email"
done

# Parallel processing
curl -s https://jsonplaceholder.typicode.com/users | jq -r '.[].id' | xargs -P 5 -I {} curl -s "https://jsonplaceholder.typicode.com/posts?userId={}" | jq -s 'add | length'

Common Gotchas

Problem Solution

null output

Use // empty or // "default"

Quotes in output

Use -r for raw strings

Array vs object

Check with type

Special chars in keys

Use ."my-key" or .["my-key"]

Exercises

  1. Fetch GitHub repos, extract only name and stars for repos with 1000+ stars

  2. Fetch JSONPlaceholder posts, group by userId, count posts per user

  3. Create a CSV export of users with columns: id, name, email, city

  4. Find the user with the longest username

  5. Calculate the average number of posts per user