ETL Session 04: API to Report
API data extraction. This session covers fetching data from APIs, processing with jq, enriching data, and generating reports.
Pre-Session State
-
Can process logs with grep/sed/awk
-
Understand jq filtering and construction
-
Know CSV output patterns
Setup
# Simulate API responses
mkdir -p /tmp/api
# Hosts API
cat > /tmp/api/hosts.json << 'EOF'
{"data": [
{"id": 1, "name": "kvm-01", "ip": "10.50.1.110", "status": "running"},
{"id": 2, "name": "kvm-02", "ip": "10.50.1.111", "status": "running"},
{"id": 3, "name": "vault-01", "ip": "10.50.1.60", "status": "warning"},
{"id": 4, "name": "db-01", "ip": "10.50.1.70", "status": "stopped"}
]}
EOF
# Metrics API
cat > /tmp/api/metrics.json << 'EOF'
{"data": [
{"host_id": 1, "cpu": 85, "memory": 72, "disk": 45},
{"host_id": 2, "cpu": 32, "memory": 58, "disk": 67},
{"host_id": 3, "cpu": 45, "memory": 89, "disk": 23},
{"host_id": 4, "cpu": 0, "memory": 0, "disk": 95}
]}
EOF
Lesson 1: curl + jq Pattern
Concept: Extract data from API and process immediately.
Exercise 1.1: Basic API fetch
# Simulate: curl -s https://api.example.com/hosts | jq ...
cat /tmp/api/hosts.json | jq '.data[] | .name'
Exercise 1.2: Filter and transform
# Get running hosts only
cat /tmp/api/hosts.json | jq -r '.data[] | select(.status == "running") | .name'
Exercise 1.3: Build new structure
# Restructure API response
cat /tmp/api/hosts.json | jq '.data | map({hostname: .name, address: .ip, state: .status})'
Lesson 2: Data Enrichment
Concept: Combine data from multiple sources.
Exercise 2.1: Manual join (lookup)
# Join hosts and metrics by host_id
cat /tmp/api/hosts.json | jq -r '.data[] | "\(.id) \(.name) \(.ip)"' | \
while read id name ip; do
cpu=$(cat /tmp/api/metrics.json | jq -r ".data[] | select(.host_id == $id) | .cpu")
echo "$name ($ip): CPU $cpu%"
done
Exercise 2.2: jq native join (slurp)
# Use jq to join two JSON files
jq -s '
.[0].data as $hosts |
.[1].data as $metrics |
$hosts | map(. as $h |
($metrics[] | select(.host_id == $h.id)) as $m |
{name: $h.name, ip: $h.ip, status: $h.status, cpu: $m.cpu, memory: $m.memory}
)
' /tmp/api/hosts.json /tmp/api/metrics.json
Exercise 2.3: Add computed fields
# Add severity based on metrics
jq -s '
.[0].data as $hosts |
.[1].data as $metrics |
$hosts | map(. as $h |
($metrics[] | select(.host_id == $h.id)) as $m |
{
name: $h.name,
cpu: $m.cpu,
severity: (if $m.cpu > 80 then "critical"
elif $m.cpu > 50 then "warning"
else "ok" end)
}
)
' /tmp/api/hosts.json /tmp/api/metrics.json
Lesson 3: Report Generation
Concept: Transform data into human-readable reports.
Exercise 3.1: Markdown report
echo "# Infrastructure Report"
echo ""
echo "Generated: $(date)"
echo ""
echo "## Host Status"
echo ""
echo "| Host | IP | Status |"
echo "|------|-----|--------|"
cat /tmp/api/hosts.json | jq -r '.data[] | "| \(.name) | \(.ip) | \(.status) |"'
Exercise 3.2: Summary section
echo "## Summary"
echo ""
total=$(cat /tmp/api/hosts.json | jq '.data | length')
running=$(cat /tmp/api/hosts.json | jq '[.data[] | select(.status == "running")] | length')
warning=$(cat /tmp/api/hosts.json | jq '[.data[] | select(.status == "warning")] | length')
stopped=$(cat /tmp/api/hosts.json | jq '[.data[] | select(.status == "stopped")] | length')
echo "- Total hosts: $total"
echo "- Running: $running"
echo "- Warning: $warning"
echo "- Stopped: $stopped"
Exercise 3.3: Alert section
echo "## Alerts"
echo ""
# High CPU
echo "### High CPU (>80%)"
jq -s '
.[0].data as $hosts |
.[1].data as $metrics |
$hosts | map(. as $h |
($metrics[] | select(.host_id == $h.id)) as $m |
select($m.cpu > 80) |
"- \($h.name): \($m.cpu)%"
) | .[]
' /tmp/api/hosts.json /tmp/api/metrics.json
echo ""
echo "### Stopped Hosts"
cat /tmp/api/hosts.json | jq -r '.data[] | select(.status == "stopped") | "- \(.name) (\(.ip))"'
Lesson 4: Complete Pipeline
Concept: End-to-end ETL pipeline.
Exercise 4.1: Full report generator
#!/bin/bash
# infra-report.sh
HOSTS="/tmp/api/hosts.json"
METRICS="/tmp/api/metrics.json"
OUTPUT="/tmp/infra-report.md"
{
echo "# Infrastructure Status Report"
echo "Generated: $(date '+%Y-%m-%d %H:%M:%S')"
echo ""
echo "## Overview"
echo ""
echo "| Metric | Value |"
echo "|--------|-------|"
echo "| Total Hosts | $(jq '.data | length' $HOSTS) |"
echo "| Running | $(jq '[.data[] | select(.status == "running")] | length' $HOSTS) |"
echo "| Avg CPU | $(jq '[.data[].cpu] | add / length | floor' $METRICS)% |"
echo ""
echo "## Detailed Status"
echo ""
echo "| Host | IP | Status | CPU | Memory | Disk |"
echo "|------|-----|--------|-----|--------|------|"
jq -s -r '
.[0].data as $hosts |
.[1].data as $metrics |
$hosts[] | . as $h |
($metrics[] | select(.host_id == $h.id)) as $m |
"| \($h.name) | \($h.ip) | \($h.status) | \($m.cpu)% | \($m.memory)% | \($m.disk)% |"
' $HOSTS $METRICS
echo ""
echo "## Alerts"
echo ""
jq -s -r '
.[0].data as $hosts |
.[1].data as $metrics |
$hosts[] | . as $h |
($metrics[] | select(.host_id == $h.id)) as $m |
select($m.cpu > 80 or $m.memory > 85 or $m.disk > 90 or $h.status != "running") |
"- **\($h.name)**: " +
(if $h.status != "running" then "Status: \($h.status) " else "" end) +
(if $m.cpu > 80 then "CPU: \($m.cpu)% " else "" end) +
(if $m.memory > 85 then "Memory: \($m.memory)% " else "" end) +
(if $m.disk > 90 then "Disk: \($m.disk)% " else "" end)
' $HOSTS $METRICS
} > $OUTPUT
echo "Report generated: $OUTPUT"
cat $OUTPUT
Summary: What You Learned
| Concept | Syntax | Example |
|---|---|---|
API fetch |
|
|
jq slurp |
|
Combine multiple files |
Join data |
|
Match by key |
Computed field |
|
Add derived values |
Markdown table |
|
jq string template |
Shell variables |
|
Capture jq output |
Report script |
|
Redirect block |
Exercises to Complete
-
[ ] Create a script that joins hosts and metrics
-
[ ] Generate CSV report with severity column
-
[ ] Build alerting report (only problem hosts)
-
[ ] Add timestamp and run as cron job
Next Session
Session 05: Infrastructure ETL - ISE, k8s, network patterns.
Session Log
| Timestamp | Notes |
|---|---|
Start |
<Record when you started> |
End |
<Record when you finished> |