comm — Set Operations on Sorted Files

Line-level set operations on sorted input — find what is unique to file A, unique to file B, or common to both.

Core Concept — Three Columns

comm compares two SORTED files and outputs three columns

# Column 1: lines only in file1
# Column 2: lines only in file2 (indented with one tab)
# Column 3: lines in both files (indented with two tabs)
comm sorted_a.txt sorted_b.txt

Files MUST be sorted — unsorted input produces garbage

# Wrong: unsorted input
comm file1.txt file2.txt

# Correct: sort first
comm <(sort file1.txt) <(sort file2.txt)

Suppress Columns — -1, -2, -3

Intersection — lines in BOTH files (suppress columns 1 and 2)

comm -12 <(sort a.txt) <(sort b.txt)

Items in A but not B — set difference A \ B (suppress columns 2 and 3)

comm -23 <(sort a.txt) <(sort b.txt)

Items in B but not A — set difference B \ A (suppress columns 1 and 3)

comm -13 <(sort a.txt) <(sort b.txt)

Items in only one file — symmetric difference (suppress column 3)

comm -3 <(sort a.txt) <(sort b.txt)

Set Operations — Thinking in Sets

Union — all unique lines from both files

sort -u a.txt b.txt
# Or equivalently:
comm <(sort a.txt) <(sort b.txt) | tr -d '\t'

Intersection — lines present in both

comm -12 <(sort a.txt) <(sort b.txt)

Difference A \ B — lines in A not in B

comm -23 <(sort a.txt) <(sort b.txt)

Symmetric difference — lines in exactly one of the two files

comm -3 <(sort a.txt) <(sort b.txt) | tr -d '\t'

Is A a subset of B? — check if difference A \ B is empty

[ -z "$(comm -23 <(sort a.txt) <(sort b.txt))" ] && echo "A is a subset of B" || echo "A has items not in B"

Practical Use Cases

Find packages installed on server A but not server B

comm -23 <(ssh server_a 'dpkg -l | awk "/^ii/{print \$2}" | sort') <(ssh server_b 'dpkg -l | awk "/^ii/{print \$2}" | sort')

Find users in group A but not group B

comm -23 <(getent group groupA | cut -d: -f4 | tr ',' '\n' | sort) <(getent group groupB | cut -d: -f4 | tr ',' '\n' | sort)

Audit: compare expected vs actual config files

comm -3 <(sort expected_files.txt) <(find /etc/myapp -type f -printf '%f\n' | sort)

Find common dependencies between two projects

comm -12 <(sort project_a/requirements.txt) <(sort project_b/requirements.txt)

Track what changed between two snapshots of a directory listing

comm -3 <(sort snapshot_yesterday.txt) <(sort snapshot_today.txt)
# Column 1 (no indent): removed since yesterday
# Column 2 (one tab):   added since yesterday

comm vs Other Tools

comm vs diff — comm gives you set operations, diff gives you edit distance

# comm: "which lines are shared/unique" — set membership
comm -12 <(sort a.txt) <(sort b.txt)

# diff: "how to transform file1 into file2" — editing script
diff a.txt b.txt

comm vs grep -f — comm requires sorted input but handles large files efficiently

# grep -f: works unsorted, but slow on large files (O(n*m))
grep -Fxf a.txt b.txt

# comm: requires sorting, but O(n+m) after sort
comm -12 <(sort a.txt) <(sort b.txt)

comm vs awk — awk is more flexible but comm is purpose-built

# awk intersection — no sorting needed, loads file1 into memory
awk 'FNR==NR {a[$0]; next} ($0 in a)' a.txt b.txt

# comm intersection — sorted files, streaming, low memory
comm -12 <(sort a.txt) <(sort b.txt)

Edge Cases

Case-insensitive comparison — lowercase both inputs

comm -12 <(tr 'A-Z' 'a-z' < a.txt | sort) <(tr 'A-Z' 'a-z' < b.txt | sort)

Ignore trailing whitespace — strip before comparing

comm -12 <(sed 's/[[:space:]]*$//' a.txt | sort) <(sed 's/[[:space:]]*$//' b.txt | sort)

Handle files with different line endings

comm -12 <(tr -d '\r' < a.txt | sort) <(tr -d '\r' < b.txt | sort)