uniq — Deduplicate & Count Adjacent Lines

Deduplicate adjacent lines, count occurrences, or isolate duplicates — always preceded by sort since uniq only compares neighbors.

Core Behavior — Adjacent Lines Only

uniq removes ADJACENT duplicate lines — input must be sorted first
# Without sorting: duplicates slip through
printf 'a\nb\na\n' | uniq
# a
# b
# a   <-- not collapsed because 'a' is not adjacent

# Correct: sort first
printf 'a\nb\na\n' | sort | uniq
# a
# b
Deduplicate a sorted file
sort data.txt | uniq

Count Occurrences — -c

Count how many times each line appears — the most-used uniq flag
sort access.log | uniq -c
Classic frequency analysis pipeline — count, sort descending, top N
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20
Count occurrences with right-aligned numbers — format the output
sort words.txt | uniq -c | awk '{printf "%6d  %s\n", $1, $2}'

Show Only Duplicates — -d

Print only lines that appear more than once
sort names.txt | uniq -d
Find duplicate entries in a list — useful for config auditing
awk '{print $1}' /etc/hosts | sort | uniq -d
Show all repeated lines with their count — -d + -c combined
sort data.txt | uniq -dc
Show ALL copies of repeated lines, not just one — -D (capital)
sort data.txt | uniq -D

Show Only Unique — -u

Print only lines that appear exactly once — no duplicates at all
sort names.txt | uniq -u
Find entries unique to a list — items without duplicates
sort combined_list.txt | uniq -u

Skip Fields — -f

Ignore the first field when comparing — deduplicate by remaining content
sort -k2 data.txt | uniq -f1
Skip timestamp field (field 1) — deduplicate log messages
sort -k2 syslog_extract.txt | uniq -f1 -c | sort -rn
Skip first two fields — compare from field 3 onward
sort -k3 records.txt | uniq -f2

Skip Characters — -s

Ignore first N characters when comparing
sort data.txt | uniq -s10
Combine -s and -w to compare only a specific character range
# Compare only characters 11-20 (skip 10, check width 10)
sort data.txt | uniq -s10 -w10

Case-Insensitive — -i

Ignore case when comparing lines
sort -f data.txt | uniq -i
Case-insensitive frequency count
tr 'A-Z' 'a-z' < words.txt | sort | uniq -c | sort -rn

Practical Patterns

Find the most common error messages — strip timestamps first
awk '{$1=$2=$3=""; print}' /var/log/syslog | sed 's/^ *//' | sort | uniq -c | sort -rn | head -15
Compare two sorted lists — items in both (intersection via uniq -d)
sort list_a.txt list_b.txt | uniq -d
Items in only one of two lists (symmetric difference via uniq -u)
sort list_a.txt list_b.txt | uniq -u
Count unique lines without showing them — just the total
sort data.txt | uniq | wc -l
Detect near-duplicate log lines — normalize PIDs before comparing
awk '{gsub(/\[[0-9]+\]/,"[PID]"); print}' /var/log/syslog | sort | uniq -c | sort -rn | head -20
uniq vs sort -u: sort -u keeps the first, uniq can count and filter
# sort -u: fast dedup, no counts, no -d/-u filtering
sort -u data.txt

# sort | uniq -c: slower, but gives frequencies
sort data.txt | uniq -c | sort -rn