CSV Processing
Process tabular data with awk, csvkit, and miller — filter, sort, join, and transform CSV/TSV.
awk for CSV/TSV
Parse CSV by comma — extract specific fields (hostname, IP)
awk -F, 'NR>1 {print $1, $3}' inventory.csv
TSV parsing — tab-delimited, print formatted output
awk -F'\t' 'NR>1 {printf "%-20s %s\n", $1, $2}' devices.tsv
Filter CSV rows by field value — only VLAN 10 devices
awk -F, '$3 == 10 {print $1, $2}' inventory.csv
Sum a numeric column — total bandwidth from CSV
awk -F, 'NR>1 {sum += $4} END {print "Total:", sum}' bandwidth.csv
CSV with quoted fields containing commas — use FPAT for proper parsing
awk -v FPAT='[^,]*|"[^"]*"' '{print $2}' complex.csv
Deduplicate CSV by a key field — keep first occurrence
awk -F, '!seen[$1]++' inventory.csv
csvkit — Purpose-built CSV Tools
View CSV as a formatted table — human-readable output
csvlook inventory.csv
Select specific columns by name — no need to count field positions
csvcut -c hostname,ip,vlan inventory.csv
Filter rows by column value — grep for structured data
csvgrep -c role -m "access" inventory.csv
Filter with regex — pattern match on a column
csvgrep -c hostname -r "^sw-core" inventory.csv
Sort by column — ascending by default
csvsort -c vlan inventory.csv
Reverse sort — descending order
csvsort -c vlan -r inventory.csv
Join two CSV files on a shared column — like SQL JOIN
csvjoin -c hostname devices.csv locations.csv
Column statistics — min, max, mean, median, count per column
csvstat inventory.csv
Statistics for a single column — focused analysis
csvstat -c vlan inventory.csv
Convert CSV to JSON — each row becomes a JSON object
csvjson inventory.csv | jq '.'
Run SQL against CSV — full SQL query capability
csvsql --query "SELECT hostname, ip FROM inventory WHERE vlan = 10 ORDER BY hostname" inventory.csv
miller (mlr) — Swiss Army Knife
Pretty-print CSV as a table
mlr --icsv --opprint cat inventory.csv
Filter rows — mlr filter expression
mlr --csv filter '$vlan == 10' inventory.csv
Add a computed column — derive new fields
mlr --csv put '$fqdn = $hostname . ".inside.domusdigitalis.dev"' inventory.csv
Convert CSV to JSON — array of objects
mlr --icsv --ojson cat inventory.csv
Group-by aggregation — count devices per role
mlr --icsv --opprint stats1 -a count -f hostname -g role inventory.csv
Sort by multiple keys
mlr --csv sort-within-groups -f role then sort -f hostname inventory.csv
Core Unix Tools
Display CSV as aligned columns — quick visual inspection
column -t -s, inventory.csv
Sort CSV by field — numeric sort on third column (VLAN)
sort -t, -k3 -n inventory.csv
Sort preserving header — header stays on top
{ head -1 inventory.csv; tail -n +2 inventory.csv | sort -t, -k3 -n; }
Join two sorted files on first field — SQL-like join with coreutils
join -t, <(sort -t, -k1 devices.csv) <(sort -t, -k1 locations.csv)
Paste files side by side — merge columns from separate files
paste -d, hostnames.txt ips.txt vlans.txt > inventory.csv
Extract a single column by position — cut
cut -d, -f2 inventory.csv
Count unique values in a column — frequency analysis
awk -F, 'NR>1 {print $3}' inventory.csv | sort | uniq -c | sort -rn
Header Management
Print CSV header — show column names with positions
head -1 inventory.csv | tr ',' '\n' | awk '{printf "%2d: %s\n", NR, $0}'
Skip header in processing — NR>1 pattern
awk -F, 'NR>1 {print $1}' inventory.csv
Add a header to headerless CSV
{ echo "hostname,ip,vlan,role"; cat raw_data.csv; } > inventory.csv
Remove header — just the data rows
tail -n +2 inventory.csv
jq with CSV/TSV
Convert JSON array to CSV — using @csv formatter
jq -r '.[] | [.hostname, .ip, (.vlan | tostring)] | @csv' devices.json
JSON to CSV with header — prepend column names
jq -r '["hostname","ip","vlan"], (.[] | [.hostname, .ip, .vlan]) | @csv' devices.json
JSON to TSV — tab-separated output for Unix tools
jq -r '.[] | [.hostname, .ip, .vlan] | @tsv' devices.json
CSV to JSON via csvjson — round-trip back to structured data
csvjson inventory.csv | jq '.[] | select(.role == "core")'