Shell Pipelines

Pipeline composition, process substitution, tee for stream splitting, xargs for argument conversion.

Pipeline Fundamentals

Multi-stage pipeline — each stage transforms the data
journalctl -u sshd --since "1 hour ago" --no-pager \
    | grep "Failed password" \
    | awk '{print $(NF-3)}' \
    | sort \
    | uniq -c \
    | sort -rn \
    | head -10

Each | connects stdout of the left command to stdin of the right. Data flows left to right, each stage filtering or transforming.

Eliminate useless cat — the command can read the file directly
# Anti-pattern (UUOC — Useless Use of Cat)
cat /etc/passwd | grep bash

# Correct — grep reads the file itself
grep bash /etc/passwd

# Correct — awk replaces grep+awk pipelines
awk -F: '/bash$/{print $1}' /etc/passwd

Process Substitution

Compare two command outputs — no temporary files
diff <(sort file1.txt) <(sort file2.txt)

# Compare installed packages between two machines
diff <(ssh host1 'pacman -Qq' | sort) <(ssh host2 'pacman -Qq' | sort)

<(cmd) creates a named pipe (/dev/fd/N) containing cmd’s output. The outer command sees it as a file.

Feed a loop — avoids subshell variable loss
count=0
while IFS= read -r line; do
    (( count++ ))
done < <(grep -c '' /var/log/*.log)
echo "Processed $count files"

If you used grep | while read, the count variable would be lost when the subshell exits. < <(cmd) keeps the loop in the current shell.

Tee — Split the Stream

Capture to file while continuing the pipeline
curl -sf https://api.example.com/data \
    | tee /tmp/raw-response.json \
    | jq '.results[] | .name' \
    | sort -u

tee writes to the file AND passes through to stdout. Debug the raw API response later without re-running the request.

Tee with sudo — the correct way to write protected files
echo "nameserver 10.50.1.50" | sudo tee /etc/resolv.conf

sudo echo "…​" > /file fails because redirection runs as the current user. sudo tee runs tee as root.

Xargs — Stdin to Arguments

Null-delimited for safety — handles filenames with spaces
find /tmp -name '*.tmp' -mtime +7 -print0 | xargs -0 rm -f
Parallel execution — 4 jobs at once
find . -name '*.png' -print0 | xargs -0 -P4 -I{} convert {} -resize 50% {}
Pipeline with error checking — pipefail makes this work
set -o pipefail

# Without pipefail: exits 0 even if curl fails
# With pipefail: exits non-zero if ANY stage fails
curl -sf https://api.example.com/data | jq '.count'

Named Pipes (FIFOs)

Persistent pipeline — producer/consumer decoupling
mkfifo /tmp/logpipe

# Terminal 1: consumer (blocks until data arrives)
awk '/ERROR/{print NR": "$0}' < /tmp/logpipe

# Terminal 2: producer
tail -f /var/log/app.log > /tmp/logpipe

Real-World Pipelines

Network audit — find listening ports and their processes
ss -tlnp \
    | awk 'NR>1 {print $4, $6}' \
    | sed 's/.*://' \
    | sort -n \
    | column -t
Git analytics — commits per author this month
git log --since="1 month ago" --format='%aN' \
    | sort \
    | uniq -c \
    | sort -rn
Parallel host check — across the fleet
printf '%s\n' ise-01 dc-01 pfsense bind \
    | xargs -P4 -I{} bash -c \
        'ping -c1 -W2 {}.inside.domusdigitalis.dev &>/dev/null \
            && echo "{}: UP" || echo "{}: DOWN"'

See Also