sort — Ordering & Deduplication
Sorting patterns for pipeline work — numeric, field-based, version sort, deduplication, and the sort | uniq -c | sort -rn frequency count pipeline.
Basic Sorting
Alphabetical sort — default behavior
sort names.txt
Reverse sort — descending alphabetical
sort -r names.txt
Numeric sort — without -n, "9" sorts after "80" lexicographically
sort -n numbers.txt
Reverse numeric — largest first
sort -rn numbers.txt
Field Sort — -t and -k
Sort /etc/passwd by UID (field 3, numeric, colon-delimited)
sort -t':' -k3,3n /etc/passwd
Sort CSV by second column, then by third column descending
sort -t',' -k2,2 -k3,3rn data.csv
Sort by field 4 numerically — space-delimited (default)
sort -k4,4n report.txt
Sort IP addresses correctly — period delimiter, numeric per octet
sort -t'.' -k1,1n -k2,2n -k3,3n -k4,4n ips.txt
Key specification matters — -k2 means "from field 2 to end", -k2,2 means "field 2 only"
# Wrong: sorts from field 2 to end of line (ties broken by field 3, 4, etc.)
sort -k2 data.txt
# Correct: sorts by field 2 only
sort -k2,2 data.txt
Unique — -u
Deduplicate after sorting — sort -u is sort | uniq in one pass
sort -u names.txt
Unique by a specific field — keeps first occurrence per key
sort -t',' -k1,1 -u data.csv
Stable Sort — -s
Preserve original order for equal elements — critical for multi-key sorting
sort -s -k1,1 data.txt
Multi-pass stable sort — sort by secondary key first, primary key second
sort -s -k3,3n data.txt | sort -s -k1,1
Human-Readable Sort — -h
Sort du output by human-readable sizes — understands K, M, G suffixes
du -sh /var/log/* 2>/dev/null | sort -h
Largest directories first
du -sh /* 2>/dev/null | sort -hr | head -10
Version Sort — -V
Sort version strings correctly — 1.9 before 1.10
printf '%s\n' v1.2 v1.10 v1.9 v2.0 v1.1 | sort -V
Sort package names with version numbers
pacman -Q | sort -k2 -V
Random Sort — -R
Shuffle lines randomly
sort -R names.txt
Pick 5 random lines from a file
sort -R names.txt | head -5
sort + uniq Pipeline
Classic frequency count — sort then count adjacent duplicates
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20
Why sort first: uniq only collapses adjacent duplicates
# Wrong — uniq misses non-adjacent duplicates:
printf 'a\nb\na\n' | uniq # outputs a, b, a (three lines)
# Correct — sort first:
printf 'a\nb\na\n' | sort | uniq # outputs a, b (two lines)
Header-Preserving Sort
Sort a file but keep the header line at the top
head -1 data.csv && tail -n +2 data.csv | sort -t',' -k2,2rn
awk approach — more robust for pipelines
awk 'NR==1 {print; next} {print | "sort -t, -k2,2rn"}' data.csv
sed approach — hold the first line, sort the rest
(read -r header; echo "$header"; sort -t',' -k2,2rn) < data.csv
Practical Patterns
Sort process list by memory usage — descending
ps aux | awk 'NR==1{print;next}{print|"sort -k4 -rn"}' | head -15
Find largest files recursively — sort by size
find /var/log -type f -exec du -h {} + 2>/dev/null | sort -hr | head -20
Sort and merge multiple sorted files efficiently
sort -m sorted1.txt sorted2.txt sorted3.txt > merged.txt
Case-insensitive sort — -f folds lowercase to uppercase
sort -f mixed_case.txt
Remove duplicate lines regardless of case
sort -fu mixed_case.txt
Check if a file is already sorted — exit code 0 if sorted
sort -c data.txt && echo "sorted" || echo "not sorted"
Sort with a specific locale — avoid surprises with LC_ALL
LC_ALL=C sort data.txt # Byte-order sort, fastest, most predictable
LC_ALL=en_US.UTF-8 sort data.txt # Locale-aware, slower