sort — Ordering & Deduplication

Sorting patterns for pipeline work — numeric, field-based, version sort, deduplication, and the sort | uniq -c | sort -rn frequency count pipeline.

Basic Sorting

Alphabetical sort — default behavior

sort names.txt

Reverse sort — descending alphabetical

sort -r names.txt

Numeric sort — without -n, "9" sorts after "80" lexicographically

sort -n numbers.txt

Reverse numeric — largest first

sort -rn numbers.txt

Field Sort — -t and -k

Sort /etc/passwd by UID (field 3, numeric, colon-delimited)

sort -t':' -k3,3n /etc/passwd

Sort CSV by second column, then by third column descending

sort -t',' -k2,2 -k3,3rn data.csv

Sort by field 4 numerically — space-delimited (default)

sort -k4,4n report.txt

Sort IP addresses correctly — period delimiter, numeric per octet

sort -t'.' -k1,1n -k2,2n -k3,3n -k4,4n ips.txt

Key specification matters — -k2 means "from field 2 to end", -k2,2 means "field 2 only"

# Wrong: sorts from field 2 to end of line (ties broken by field 3, 4, etc.)
sort -k2 data.txt
# Correct: sorts by field 2 only
sort -k2,2 data.txt

Unique — -u

Deduplicate after sorting — sort -u is sort | uniq in one pass

sort -u names.txt

Unique by a specific field — keeps first occurrence per key

sort -t',' -k1,1 -u data.csv

Stable Sort — -s

Preserve original order for equal elements — critical for multi-key sorting

sort -s -k1,1 data.txt

Multi-pass stable sort — sort by secondary key first, primary key second

sort -s -k3,3n data.txt | sort -s -k1,1

Human-Readable Sort — -h

Sort du output by human-readable sizes — understands K, M, G suffixes

du -sh /var/log/* 2>/dev/null | sort -h

Largest directories first

du -sh /* 2>/dev/null | sort -hr | head -10

Version Sort — -V

Sort version strings correctly — 1.9 before 1.10

printf '%s\n' v1.2 v1.10 v1.9 v2.0 v1.1 | sort -V

Sort package names with version numbers

pacman -Q | sort -k2 -V

Random Sort — -R

Shuffle lines randomly

sort -R names.txt

Pick 5 random lines from a file

sort -R names.txt | head -5

sort + uniq Pipeline

Classic frequency count — sort then count adjacent duplicates

awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20

Why sort first: uniq only collapses adjacent duplicates

# Wrong — uniq misses non-adjacent duplicates:
printf 'a\nb\na\n' | uniq           # outputs a, b, a (three lines)
# Correct — sort first:
printf 'a\nb\na\n' | sort | uniq    # outputs a, b (two lines)

Header-Preserving Sort

Sort a file but keep the header line at the top

head -1 data.csv && tail -n +2 data.csv | sort -t',' -k2,2rn

awk approach — more robust for pipelines

awk 'NR==1 {print; next} {print | "sort -t, -k2,2rn"}' data.csv

sed approach — hold the first line, sort the rest

(read -r header; echo "$header"; sort -t',' -k2,2rn) < data.csv

Practical Patterns

Sort process list by memory usage — descending

ps aux | awk 'NR==1{print;next}{print|"sort -k4 -rn"}' | head -15

Find largest files recursively — sort by size

find /var/log -type f -exec du -h {} + 2>/dev/null | sort -hr | head -20

Sort and merge multiple sorted files efficiently

sort -m sorted1.txt sorted2.txt sorted3.txt > merged.txt

Case-insensitive sort — -f folds lowercase to uppercase

sort -f mixed_case.txt

Remove duplicate lines regardless of case

sort -fu mixed_case.txt

Check if a file is already sorted — exit code 0 if sorted

sort -c data.txt && echo "sorted" || echo "not sorted"

Sort with a specific locale — avoid surprises with LC_ALL

LC_ALL=C sort data.txt       # Byte-order sort, fastest, most predictable
LC_ALL=en_US.UTF-8 sort data.txt  # Locale-aware, slower