xargs: Parallel Execution & Pipeline Multiplication
xargs multiplies. It takes input and runs commands at scale - sequentially or in parallel.
Core Concepts
The xargs Model
┌─────────────────────────────────────────────────────────────────┐
│ XARGS PROCESSING MODEL │
├─────────────────────────────────────────────────────────────────┤
│ │
│ INPUT XARGS OUTPUT │
│ ───── ───── ────── │
│ │
│ item1 ──┐ │
│ item2 ──┼──► xargs cmd {} ──► cmd item1 │
│ item3 ──┤ ──► cmd item2 │
│ item4 ──┘ ──► cmd item3 │
│ ──► cmd item4 │
│ │
│ MODES │
│ ───── │
│ Sequential: cmd item1; cmd item2; cmd item3 │
│ Parallel: cmd item1 & cmd item2 & cmd item3 (with -P) │
│ Batched: cmd item1 item2 item3 (default) │
│ │
└─────────────────────────────────────────────────────────────────┘
Essential Flags
| Flag | Purpose | Example |
|---|---|---|
|
Replace string |
|
|
Parallel jobs |
|
|
N args per command |
|
|
Null delimiter |
|
|
Print commands |
|
|
Prompt before exec |
|
|
No run if empty |
|
|
N lines per command |
|
|
Custom delimiter |
|
Basic Patterns
Default Behavior (Batching)
# Passes all args to single command
echo "a b c d" | xargs echo
# Output: a b c d (one echo call)
# Same as:
echo a b c d
Parallel Execution
Basic Parallel
# 4 parallel jobs
cat urls.txt | xargs -P4 -I {} curl -s {}
# All CPUs
cat tasks.txt | xargs -P$(nproc) -I {} process {}
# Parallel with batching (4 jobs, 10 items each)
cat items.txt | xargs -P4 -n10 process_batch
Parallel with Progress
# Simple counter
cat hosts.txt | xargs -P4 -I {} sh -c 'echo "Processing: {}"; ping -c1 {} >/dev/null && echo "{}: OK" || echo "{}: FAIL"'
# With timestamps
cat tasks.txt | xargs -P4 -I {} sh -c 'echo "[$(date +%H:%M:%S)] Starting {}"; process {}; echo "[$(date +%H:%M:%S)] Finished {}"'
Safe File Handling
Infrastructure Automation
SSH to Multiple Hosts
# Run command on multiple hosts
cat hosts.txt | xargs -P5 -I {} ssh {} 'hostname; uptime'
# Parallel with output labeling
cat hosts.txt | xargs -P5 -I {} sh -c 'echo "=== {} ==="; ssh {} "df -h /"'
# With timeout
cat hosts.txt | xargs -P10 -I {} timeout 30 ssh {} 'systemctl status nginx' 2>/dev/null
Batch DNS Lookups
# Resolve multiple hostnames
cat hostnames.txt | xargs -P10 -I {} sh -c 'echo -n "{}: "; dig +short {}'
# Reverse DNS
cat ips.txt | xargs -P10 -I {} sh -c 'echo -n "{}: "; dig +short -x {}'
Certificate Checks
# Check expiry on multiple hosts
cat hosts.txt | xargs -P5 -I {} sh -c '
expire=$(echo | timeout 5 openssl s_client -connect {}:443 -servername {} 2>/dev/null | \
openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)
echo "{}: $expire"
'
Bulk File Operations
# Compress logs in parallel
find /var/log -name "*.log" -mtime +7 -print0 | xargs -0 -P4 gzip
# Calculate checksums
find /data -type f -print0 | xargs -0 -P$(nproc) sha256sum > checksums.txt
# Bulk rename
ls *.jpeg | xargs -I {} sh -c 'mv "$1" "${1%.jpeg}.jpg"' _ {}
API Bulk Operations
# Delete multiple resources
cat resource_ids.txt | xargs -P5 -I {} curl -s -X DELETE "https://api.example.com/resources/{}"
# Bulk data fetch
seq 1 1000 | xargs -P10 -I {} curl -s "https://api.example.com/items/{}" | jq -s '.'
# Parallel with jq processing
cat user_ids.txt | xargs -P5 -I {} sh -c 'curl -s "https://api.example.com/users/{}" | jq "{id: {}, data: .}"'
Process Substitution Integration
Compare Outputs
# Compare outputs from two hosts
diff <(ssh host1 'cat /etc/passwd') <(ssh host2 'cat /etc/passwd')
# Compare multiple configs
cat hosts.txt | xargs -I {} sh -c 'echo "=== {} ==="; ssh {} "cat /etc/resolv.conf"' | less
Parallel Processing with Results
# Collect results into array
mapfile -t results < <(cat hosts.txt | xargs -P10 -I {} sh -c 'ssh {} hostname 2>/dev/null')
echo "Collected ${#results[@]} results"
# Process results
cat hosts.txt | xargs -P10 -I {} sh -c 'ssh {} "free -m" 2>/dev/null | awk "/Mem:/{print \"{}: \" \$3 \"MB used\"}"'
Conditional Execution
Only If Input Exists (-r)
# Don't run if no input (GNU xargs)
find . -name "*.tmp" -print0 | xargs -0 -r rm
# If no .tmp files, rm is never called
# Without -r, rm would be called with no args (error)
Advanced Patterns
Multiple Placeholders
# Use placeholder multiple times
echo "file.txt" | xargs -I {} sh -c 'cp {} {}.bak && echo "Backed up {}"'
# Extract filename components
ls *.tar.gz | xargs -I {} sh -c 'mkdir -p ${1%.tar.gz} && tar xzf $1 -C ${1%.tar.gz}' _ {}
With Find
# Better than find -exec (parallelism)
find /data -name "*.csv" -print0 | xargs -0 -P4 -I {} process_csv {}
# vs find -exec (sequential)
find /data -name "*.csv" -exec process_csv {} \;
# Batch mode (fewer process spawns)
find /data -name "*.txt" -print0 | xargs -0 grep -l "pattern"
Building Complex Commands
# Multiple commands per input
cat items.txt | xargs -I {} sh -c '
echo "Processing: {}"
validate {} || exit 1
process {}
echo "Done: {}"
'
# With error handling
cat hosts.txt | xargs -P5 -I {} sh -c '
if ssh {} "test -f /etc/myapp.conf"; then
echo "{}: configured"
else
echo "{}: NOT configured"
fi
'
Performance Tuning
Batch Size (-n)
# Too many args: split into batches
cat huge_list.txt | xargs -n100 process_batch
# One at a time (slowest but safest)
cat items.txt | xargs -n1 process_item
Parallelism Sweet Spot
# CPU-bound: use CPU count
xargs -P$(nproc) ...
# I/O-bound: can go higher
xargs -P20 ... # Network calls
# Memory-bound: be conservative
xargs -P4 ... # Heavy processing
# Testing different values
for p in 1 2 4 8 16; do
echo "P=$p"
time cat tasks.txt | xargs -P$p -I {} process {} >/dev/null
done
Error Handling
Continue on Error
# Default: continues even if command fails
cat hosts.txt | xargs -I {} sh -c 'ssh {} "command" || echo "{} failed"'
# Explicit continue
cat hosts.txt | xargs -I {} sh -c 'ssh {} "command" 2>/dev/null || true'
Quick Reference
| Task | Command |
|---|---|
One item per command |
|
With placeholder |
|
Parallel (4 jobs) |
|
Null-delimited input |
|
Show commands |
|
Confirm each |
|
Skip if empty |
|
Custom delimiter |
|
Batch size |
|
Lines per command |
|
Max command length |
|
From file |
|
Common Patterns
# Parallel SSH
cat hosts | xargs -P10 -I {} ssh {} 'cmd'
# Safe file operations
find . -print0 | xargs -0 cmd
# Parallel downloads
cat urls | xargs -P5 -I {} curl -sO {}
# Bulk API
cat ids | xargs -P10 -I {} curl -s "api/{}" | jq -s
# Parallel processing
find . -name "*.log" -print0 | xargs -0 -P$(nproc) gzip
xargs vs Alternatives
| Task | xargs | Alternative |
|---|---|---|
Simple iteration |
|
|
Parallel execution |
|
GNU |
With find |
|
|
Complex logic |
|
Shell loop (clearer) |
When to use xargs: - Parallel execution needed - Large number of items - Simple per-item commands
When to use loops: - Complex logic per item - Need shell variables - Readability matters more
Related
-
Shell Pipelines - Complex workflows
-
awk - Process xargs output