uniq — Deduplicate & Count Adjacent Lines

Deduplicate adjacent lines, count occurrences, or isolate duplicates — always preceded by sort since uniq only compares neighbors.

Core Behavior — Adjacent Lines Only

uniq removes ADJACENT duplicate lines — input must be sorted first

# Without sorting: duplicates slip through
printf 'a\nb\na\n' | uniq
# a
# b
# a   <-- not collapsed because 'a' is not adjacent

# Correct: sort first
printf 'a\nb\na\n' | sort | uniq
# a
# b

Deduplicate a sorted file

sort data.txt | uniq

Count Occurrences — -c

Count how many times each line appears — the most-used uniq flag

sort access.log | uniq -c

Classic frequency analysis pipeline — count, sort descending, top N

awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20

Count occurrences with right-aligned numbers — format the output

sort words.txt | uniq -c | awk '{printf "%6d  %s\n", $1, $2}'

Show Only Duplicates — -d

Print only lines that appear more than once

sort names.txt | uniq -d

Find duplicate entries in a list — useful for config auditing

awk '{print $1}' /etc/hosts | sort | uniq -d

Show all repeated lines with their count — -d + -c combined

sort data.txt | uniq -dc

Show ALL copies of repeated lines, not just one — -D (capital)

sort data.txt | uniq -D

Show Only Unique — -u

Print only lines that appear exactly once — no duplicates at all

sort names.txt | uniq -u

Find entries unique to a list — items without duplicates

sort combined_list.txt | uniq -u

Skip Fields — -f

Ignore the first field when comparing — deduplicate by remaining content

sort -k2 data.txt | uniq -f1

Skip timestamp field (field 1) — deduplicate log messages

sort -k2 syslog_extract.txt | uniq -f1 -c | sort -rn

Skip first two fields — compare from field 3 onward

sort -k3 records.txt | uniq -f2

Skip Characters — -s

Ignore first N characters when comparing

sort data.txt | uniq -s10

Combine -s and -w to compare only a specific character range

# Compare only characters 11-20 (skip 10, check width 10)
sort data.txt | uniq -s10 -w10

Case-Insensitive — -i

Ignore case when comparing lines

sort -f data.txt | uniq -i

Case-insensitive frequency count

tr 'A-Z' 'a-z' < words.txt | sort | uniq -c | sort -rn

Practical Patterns

Find the most common error messages — strip timestamps first

awk '{$1=$2=$3=""; print}' /var/log/syslog | sed 's/^ *//' | sort | uniq -c | sort -rn | head -15

Compare two sorted lists — items in both (intersection via uniq -d)

sort list_a.txt list_b.txt | uniq -d

Items in only one of two lists (symmetric difference via uniq -u)

sort list_a.txt list_b.txt | uniq -u

Count unique lines without showing them — just the total

sort data.txt | uniq | wc -l

Detect near-duplicate log lines — normalize PIDs before comparing

awk '{gsub(/\[[0-9]+\]/,"[PID]"); print}' /var/log/syslog | sort | uniq -c | sort -rn | head -20

uniq vs sort -u: sort -u keeps the first, uniq can count and filter

# sort -u: fast dedup, no counts, no -d/-u filtering
sort -u data.txt

# sort | uniq -c: slower, but gives frequencies
sort data.txt | uniq -c | sort -rn