ETL Session 02: JSON to CSV
Data format conversion. This session covers extracting data from JSON using jq and converting to CSV, TSV, and formatted tables.
Pre-Session State
-
Understand pipes and tee
-
Know basic xargs usage
-
Can run jq identity filter
Setup
cat > /tmp/servers.json << 'EOF'
[
{"name": "kvm-01", "ip": "10.50.1.110", "role": "hypervisor", "cpu": 85},
{"name": "kvm-02", "ip": "10.50.1.111", "role": "hypervisor", "cpu": 45},
{"name": "vault-01", "ip": "10.50.1.60", "role": "secrets", "cpu": 32},
{"name": "ise-01", "ip": "10.50.1.20", "role": "nac", "cpu": 78}
]
EOF
Lesson 1: jq Field Extraction
Concept: Build arrays of fields for CSV conversion.
Exercise 1.1: Extract single field
cat /tmp/servers.json | jq -r '.[].name'
Output: Each name on its own line.
Exercise 1.2: Extract multiple fields as array
cat /tmp/servers.json | jq '.[] | [.name, .ip, .role]'
Output: Arrays like ["kvm-01", "10.50.1.110", "hypervisor"]
Exercise 1.3: Flatten to values
cat /tmp/servers.json | jq -r '.[] | [.name, .ip, .cpu] | @tsv'
Output: Tab-separated values (TSV).
Lesson 2: @csv Output
Concept: @csv properly escapes and quotes CSV fields.
Exercise 2.1: Basic CSV
cat /tmp/servers.json | jq -r '.[] | [.name, .ip, .role, .cpu] | @csv'
Output:
"kvm-01","10.50.1.110","hypervisor",85 "kvm-02","10.50.1.111","hypervisor",45
Exercise 2.2: CSV with header
cat /tmp/servers.json | jq -r '
["name","ip","role","cpu"],
(.[] | [.name, .ip, .role, .cpu])
| @csv'
Output: CSV with header row first.
Exercise 2.3: Selective fields
# Only hypervisors, only name and cpu
cat /tmp/servers.json | jq -r '
["host","cpu_percent"],
(.[] | select(.role == "hypervisor") | [.name, .cpu])
| @csv'
Lesson 3: awk Post-Processing
Concept: Use awk to format jq output into tables.
Exercise 3.1: TSV to table
cat /tmp/servers.json | jq -r '.[] | [.name, .ip, .cpu] | @tsv' |
awk -F'\t' '{printf "%-12s %-15s %3d%%\n", $1, $2, $3}'
Output:
kvm-01 10.50.1.110 85% kvm-02 10.50.1.111 45%
Exercise 3.2: Add header and footer
cat /tmp/servers.json | jq -r '.[] | [.name, .ip, .cpu] | @tsv' |
awk -F'\t' '
BEGIN {
printf "%-12s %-15s %s\n", "HOST", "IP", "CPU"
print "----------------------------------------"
}
{ printf "%-12s %-15s %3d%%\n", $1, $2, $3; total+=$3; count++ }
END {
print "----------------------------------------"
printf "Average CPU: %.1f%%\n", total/count
}'
Exercise 3.3: Conditional formatting
cat /tmp/servers.json | jq -r '.[] | [.name, .cpu] | @tsv' |
awk -F'\t' '{
status = ($2 > 80) ? "HIGH" : ($2 > 50) ? "MED" : "OK"
printf "%-12s %3d%% [%s]\n", $1, $2, status
}'
Lesson 4: Complete ETL Pipeline
Concept: Chain Extract → Transform → Load steps.
Exercise 4.1: JSON to report file
cat /tmp/servers.json | \
jq -r '.[] | select(.cpu > 50) | [.name, .ip, .cpu] | @tsv' | \
awk -F'\t' 'BEGIN {print "HIGH CPU SERVERS"} {printf "%s (%s): %d%%\n", $1, $2, $3}' | \
tee /tmp/high-cpu-report.txt
Exercise 4.2: Multiple output formats
# Generate both CSV and markdown table
cat /tmp/servers.json | jq -r '.[] | [.name, .ip, .role] | @csv' > /tmp/servers.csv
cat /tmp/servers.json | jq -r '
"| Name | IP | Role |",
"|------|-----|------|",
(.[] | "| \(.name) | \(.ip) | \(.role) |")' > /tmp/servers.md
Exercise 4.3: Aggregation pipeline
# Count servers by role
cat /tmp/servers.json | \
jq -r 'group_by(.role) | .[] | "\(.[0].role): \(length)"'
Output:
hypervisor: 2 nac: 1 secrets: 1
Summary: What You Learned
| Concept | Syntax | Example |
|---|---|---|
Extract field |
|
|
Build array |
|
|
CSV output |
|
|
TSV output |
|
|
Header row |
|
Literal array first |
awk format |
|
Left-pad to 10 |
awk aggregate |
|
Running totals |
jq group |
|
Aggregate by field |
Exercises to Complete
-
[ ] Convert servers.json to CSV with header
-
[ ] Create a markdown table from JSON
-
[ ] Build a report showing only high CPU hosts
-
[ ] Count servers by role and format as table
Next Session
Session 03: Log Processing - grep, sed, awk for log analysis.
Session Log
| Timestamp | Notes |
|---|---|
Start |
<Record when you started> |
End |
<Record when you finished> |