jq Pipelines: Data Transformation
Why jq?
APIs return JSON. jq transforms it:
# Without jq: wall of text
curl -s https://api.github.com/users/torvalds
# With jq: beautiful, filtered
curl -s https://api.github.com/users/torvalds | jq '{name, company, location}'
Core Syntax
| Syntax | Meaning | Example |
|---|---|---|
|
Identity (whole input) |
|
|
Object key |
|
|
Array iterator |
|
|
Array index |
|
|
Array slice |
|
|
Pipe to next filter |
|
|
Multiple outputs |
|
Practical Patterns
Extract Single Field
# Get user's email
curl -s https://jsonplaceholder.typicode.com/users/1 | jq -r '.email'
# Output: Sincere@april.biz
# -r = raw output (no quotes)
Extract Multiple Fields
# Create new object
curl -s https://jsonplaceholder.typicode.com/users/1 | jq '{name, email, city: .address.city}'
# Output: {"name": "Leanne Graham", "email": "Sincere@april.biz", "city": "Gwenborough"}
Process Array
# All users' names
curl -s https://jsonplaceholder.typicode.com/users | jq '.[].name'
# As array
curl -s https://jsonplaceholder.typicode.com/users | jq '[.[].name]'
# With index
curl -s https://jsonplaceholder.typicode.com/users | jq 'to_entries | .[] | "\(.key): \(.value.name)"'
Filter Array
# Select where condition
curl -s https://jsonplaceholder.typicode.com/users | jq '.[] | select(.address.city == "Gwenborough")'
# Multiple conditions
curl -s https://jsonplaceholder.typicode.com/users | jq '.[] | select(.id > 3 and .id < 7)'
# Contains
curl -s https://jsonplaceholder.typicode.com/users | jq '.[] | select(.email | contains("biz"))'
Transform Array
# Map to new structure
curl -s https://jsonplaceholder.typicode.com/users | jq '[.[] | {id, name, email}]'
# Add computed field
curl -s https://jsonplaceholder.typicode.com/users | jq '[.[] | . + {domain: (.email | split("@")[1])}]'
Sort and Limit
# Sort by field
curl -s https://jsonplaceholder.typicode.com/users | jq 'sort_by(.name)'
# Reverse sort
curl -s https://jsonplaceholder.typicode.com/users | jq 'sort_by(.name) | reverse'
# First 3
curl -s https://jsonplaceholder.typicode.com/users | jq '[:3]'
# Last 3
curl -s https://jsonplaceholder.typicode.com/users | jq '.[-3:]'
Group and Count
# Group by field
curl -s https://jsonplaceholder.typicode.com/posts | jq 'group_by(.userId)'
# Count per group
curl -s https://jsonplaceholder.typicode.com/posts | jq 'group_by(.userId) | .[] | {userId: .[0].userId, count: length}'
# Total count
curl -s https://jsonplaceholder.typicode.com/users | jq 'length'
String Operations
# Split
jq '.email | split("@")'
# Join
jq '.tags | join(", ")'
# Upper/lower
jq '.name | ascii_upcase'
jq '.name | ascii_downcase'
# Contains
jq 'select(.name | contains("John"))'
# Starts/ends with
jq 'select(.email | startswith("admin"))'
jq 'select(.email | endswith(".org"))'
# Regex match
jq 'select(.email | test("^[a-z]+@"))'
# Replace
jq '.name | gsub(" "; "_")'
Nested Data
# Deep access
curl -s https://jsonplaceholder.typicode.com/users/1 | jq '.address.geo.lat'
# Optional access (no error if missing)
jq '.config.settings.theme // "default"'
# Recursive descent (find all keys named "id")
jq '.. | .id? // empty'
Output Formats
# Raw string (no quotes)
jq -r '.name'
# Compact (one line)
jq -c '.'
# Tab-separated
jq -r '[.name, .email] | @tsv'
# CSV
jq -r '[.name, .email] | @csv'
# URI encoded
jq -r '.query | @uri'
# Shell-safe
jq -r '@sh'
Building Tables
# TSV table with headers
(echo "ID\tNAME\tEMAIL"; curl -s https://jsonplaceholder.typicode.com/users | jq -r '.[] | [.id, .name, .email] | @tsv') | column -t -s $'\t'
# Output:
# ID NAME EMAIL
# 1 Leanne Graham Sincere@april.biz
# 2 Ervin Howell Shanna@melissa.tv
Conditional Logic
# If-then-else
jq 'if .active then "enabled" else "disabled" end'
# Multiple conditions
jq 'if .status == "open" then "🟢"
elif .status == "closed" then "🔴"
else "🟡" end'
# Null handling
jq '.value // "N/A"'
Variables and Functions
# Define variable
jq --arg name "John" '.users[] | select(.name == $name)'
# From file
jq --slurpfile config config.json '. + $config[0]'
# Define function
jq 'def double: . * 2; .numbers | map(double)'
Real-World Pipeline
# GitHub: Get repos, filter, transform, sort, format
curl -s "https://api.github.com/users/torvalds/repos?per_page=100" | jq '
[.[] | select(.fork == false)] # Exclude forks
| sort_by(.stargazers_count) | reverse # Sort by stars desc
| .[0:5] # Top 5
| .[] | { # Transform
name,
stars: .stargazers_count,
url: .html_url,
language
}
'
jq + Shell Integration
# Read into shell variable
repo_count=$(curl -s https://api.github.com/users/torvalds | jq -r '.public_repos')
# Loop over results
curl -s https://jsonplaceholder.typicode.com/users | jq -r '.[].email' | while read email; do
echo "Processing: $email"
done
# Parallel processing
curl -s https://jsonplaceholder.typicode.com/users | jq -r '.[].id' | xargs -P 5 -I {} curl -s "https://jsonplaceholder.typicode.com/posts?userId={}" | jq -s 'add | length'
Common Gotchas
| Problem | Solution |
|---|---|
|
Use |
Quotes in output |
Use |
Array vs object |
Check with |
Special chars in keys |
Use |
Exercises
-
Fetch GitHub repos, extract only name and stars for repos with 1000+ stars
-
Fetch JSONPlaceholder posts, group by userId, count posts per user
-
Create a CSV export of users with columns: id, name, email, city
-
Find the user with the longest username
-
Calculate the average number of posts per user