ETL Session 01: Pipeline Basics
The Unix philosophy in action. This session covers the pipe operator, tee for splitting output, process substitution, and xargs for command building.
Pre-Session State
-
Can run basic shell commands
-
Understand stdin/stdout concept
-
Know basic grep/cat usage
Setup
# Create test data
cat > /tmp/hosts.txt << 'EOF'
kvm-01 10.50.1.110 hypervisor
kvm-02 10.50.1.111 hypervisor
vault-01 10.50.1.60 secrets
vault-02 10.50.1.61 secrets
ise-01 10.50.1.20 nac
EOF
Lesson 1: Pipe Operator
Concept: | connects stdout of one command to stdin of next.
Exercise 1.1: Simple pipeline
# Extract → Filter → Count
cat /tmp/hosts.txt | grep hypervisor | wc -l
Output: 2
Exercise 1.2: Multi-stage pipeline
# Extract IPs from hypervisors, sort unique
cat /tmp/hosts.txt | grep hypervisor | awk '{print $2}' | sort -u
Exercise 1.3: Pipeline with formatting
# Format as "hostname: ip"
cat /tmp/hosts.txt | awk '{print $1 ": " $2}'
Lesson 2: tee - Split the Stream
Concept: tee writes to file AND stdout simultaneously.
Exercise 2.1: Save and continue
# Save intermediate result while continuing pipeline
cat /tmp/hosts.txt | grep hypervisor | tee /tmp/hypervisors.txt | wc -l
cat /tmp/hypervisors.txt # Verify saved
Exercise 2.2: Multiple outputs
# Save to multiple files
cat /tmp/hosts.txt | tee /tmp/copy1.txt /tmp/copy2.txt | wc -l
Exercise 2.3: Append mode
# Append instead of overwrite
echo "new-host 10.50.1.200 test" | tee -a /tmp/hosts.txt
Lesson 3: Process Substitution
Concept: <(cmd) and >(cmd) create virtual files from commands.
Exercise 3.1: Compare two commands
# Compare output of two commands
diff <(cat /tmp/hosts.txt | awk '{print $1}' | sort) \
<(echo -e "ise-01\nkvm-01\nkvm-02\nvault-01\nvault-02")
Exercise 3.2: Multiple inputs
# Paste combines columns from multiple sources
paste <(cat /tmp/hosts.txt | awk '{print $1}') \
<(cat /tmp/hosts.txt | awk '{print $2}')
Exercise 3.3: Output substitution
# Log to file while displaying on screen
cat /tmp/hosts.txt | grep hypervisor > >(tee /tmp/log.txt)
Lesson 4: xargs - Build Commands
Concept: xargs converts stdin to command arguments.
Exercise 4.1: Basic xargs
# Echo each hostname
cat /tmp/hosts.txt | awk '{print $1}' | xargs echo "Hosts:"
# Output: Hosts: kvm-01 kvm-02 vault-01 vault-02 ise-01
Exercise 4.2: One at a time (-n 1)
# Run command for each line
cat /tmp/hosts.txt | awk '{print $2}' | xargs -n 1 echo "IP:"
# Output: IP: 10.50.1.110
# IP: 10.50.1.111
# ...
Exercise 4.3: Placeholder (-I)
# Use placeholder for positioning
cat /tmp/hosts.txt | awk '{print $1}' | xargs -I {} echo "Pinging {}..."
# Output: Pinging kvm-01...
# Pinging kvm-02...
Exercise 4.4: Parallel execution (-P)
# Run 4 pings in parallel
cat /tmp/hosts.txt | awk '{print $2}' | xargs -n 1 -P 4 ping -c 1
Lesson 5: Command Grouping
Concept: Group commands for combined output.
Exercise 5.1: Subshell grouping
# Combine multiple outputs
(echo "=== Hypervisors ===" && grep hypervisor /tmp/hosts.txt) | cat
Exercise 5.2: Brace grouping
# Same result, no subshell
{ echo "=== Report ==="; cat /tmp/hosts.txt; echo "=== End ==="; } | cat
Summary: What You Learned
| Concept | Syntax | Example |
|---|---|---|
Pipe |
|
|
Tee |
|
Save and continue |
Tee append |
|
Append mode |
Process sub (in) |
|
|
Process sub (out) |
|
|
xargs basic |
|
Build command line |
xargs one |
|
One arg per execution |
xargs placeholder |
|
Position arg in command |
xargs parallel |
|
Run N in parallel |
Subshell |
|
Grouped in subshell |
Brace group |
|
Grouped, no subshell |
Exercises to Complete
-
[ ] Extract all IPs, save to file, count total
-
[ ] Compare hostnames from two files using process substitution
-
[ ] Ping all hosts in parallel using xargs
-
[ ] Create a report with header, data, footer using grouping
Next Session
Session 02: JSON to CSV - jq transforms, @csv output.
Session Log
| Timestamp | Notes |
|---|---|
Start |
<Record when you started> |
End |
<Record when you finished> |