comm — Set Operations on Sorted Files
Line-level set operations on sorted input — find what is unique to file A, unique to file B, or common to both.
Core Concept — Three Columns
comm compares two SORTED files and outputs three columns
# Column 1: lines only in file1
# Column 2: lines only in file2 (indented with one tab)
# Column 3: lines in both files (indented with two tabs)
comm sorted_a.txt sorted_b.txt
Files MUST be sorted — unsorted input produces garbage
# Wrong: unsorted input
comm file1.txt file2.txt
# Correct: sort first
comm <(sort file1.txt) <(sort file2.txt)
Suppress Columns — -1, -2, -3
Intersection — lines in BOTH files (suppress columns 1 and 2)
comm -12 <(sort a.txt) <(sort b.txt)
Items in A but not B — set difference A \ B (suppress columns 2 and 3)
comm -23 <(sort a.txt) <(sort b.txt)
Items in B but not A — set difference B \ A (suppress columns 1 and 3)
comm -13 <(sort a.txt) <(sort b.txt)
Items in only one file — symmetric difference (suppress column 3)
comm -3 <(sort a.txt) <(sort b.txt)
Set Operations — Thinking in Sets
Union — all unique lines from both files
sort -u a.txt b.txt
# Or equivalently:
comm <(sort a.txt) <(sort b.txt) | tr -d '\t'
Intersection — lines present in both
comm -12 <(sort a.txt) <(sort b.txt)
Difference A \ B — lines in A not in B
comm -23 <(sort a.txt) <(sort b.txt)
Symmetric difference — lines in exactly one of the two files
comm -3 <(sort a.txt) <(sort b.txt) | tr -d '\t'
Is A a subset of B? — check if difference A \ B is empty
[ -z "$(comm -23 <(sort a.txt) <(sort b.txt))" ] && echo "A is a subset of B" || echo "A has items not in B"
Practical Use Cases
Find packages installed on server A but not server B
comm -23 <(ssh server_a 'dpkg -l | awk "/^ii/{print \$2}" | sort') <(ssh server_b 'dpkg -l | awk "/^ii/{print \$2}" | sort')
Find users in group A but not group B
comm -23 <(getent group groupA | cut -d: -f4 | tr ',' '\n' | sort) <(getent group groupB | cut -d: -f4 | tr ',' '\n' | sort)
Audit: compare expected vs actual config files
comm -3 <(sort expected_files.txt) <(find /etc/myapp -type f -printf '%f\n' | sort)
Find common dependencies between two projects
comm -12 <(sort project_a/requirements.txt) <(sort project_b/requirements.txt)
Track what changed between two snapshots of a directory listing
comm -3 <(sort snapshot_yesterday.txt) <(sort snapshot_today.txt)
# Column 1 (no indent): removed since yesterday
# Column 2 (one tab): added since yesterday
comm vs Other Tools
comm vs diff — comm gives you set operations, diff gives you edit distance
# comm: "which lines are shared/unique" — set membership
comm -12 <(sort a.txt) <(sort b.txt)
# diff: "how to transform file1 into file2" — editing script
diff a.txt b.txt
comm vs grep -f — comm requires sorted input but handles large files efficiently
# grep -f: works unsorted, but slow on large files (O(n*m))
grep -Fxf a.txt b.txt
# comm: requires sorting, but O(n+m) after sort
comm -12 <(sort a.txt) <(sort b.txt)
comm vs awk — awk is more flexible but comm is purpose-built
# awk intersection — no sorting needed, loads file1 into memory
awk 'FNR==NR {a[$0]; next} ($0 in a)' a.txt b.txt
# comm intersection — sorted files, streaming, low memory
comm -12 <(sort a.txt) <(sort b.txt)
Edge Cases
Case-insensitive comparison — lowercase both inputs
comm -12 <(tr 'A-Z' 'a-z' < a.txt | sort) <(tr 'A-Z' 'a-z' < b.txt | sort)
Ignore trailing whitespace — strip before comparing
comm -12 <(sed 's/[[:space:]]*$//' a.txt | sort) <(sed 's/[[:space:]]*$//' b.txt | sort)
Handle files with different line endings
comm -12 <(tr -d '\r' < a.txt | sort) <(tr -d '\r' < b.txt | sort)