Tool Integration
Regex is a universal skill - the same patterns work across tools with minor syntax adjustments. This guide covers practical usage in the tools you’ll use daily for infrastructure work.
grep
The workhorse of text searching. Know when to use each mode.
Modes
grep 'pattern' # BRE (basic, limited features)
grep -E 'pattern' # ERE (extended, modern syntax)
grep -P 'pattern' # PCRE (Perl, full features)
grep -F 'string' # Fixed string (no regex, fastest)
Essential Flags
| Flag | Purpose |
|---|---|
|
Case insensitive |
|
Invert match (lines NOT matching) |
|
Output only matched part |
|
Show line numbers |
|
Count matches |
|
List files with matches |
|
List files without matches |
|
Recursive search |
|
Whole word only |
|
Show n lines after match |
|
Show n lines before match |
|
Show n lines of context |
Common Patterns
# Find IPs in log
grep -oP '\d{1,3}(\.\d{1,3}){3}' access.log
# Find error lines with context
grep -B2 -A2 'ERROR' application.log
# Recursive search, show files only
grep -rl 'password' .
# Count matches per file
grep -rc 'TODO' src/
# Extract value after key
grep -oP 'port=\K\d+' config.txt
# Case-insensitive whole word
grep -iw 'error' log.txt
Performance Tips
# Fixed string is fastest
grep -F '192.168.1.1' log.txt
# Use --include to filter files
grep -r --include='*.py' 'import' src/
# Use --exclude for speed
grep -r --exclude='*.log' 'pattern' .
# Limit output
grep -m 10 'pattern' file.txt # First 10 matches
ripgrep (rg)
Modern grep replacement - faster, smarter defaults, better UX.
Why ripgrep
-
Faster - Rust-based, parallel processing
-
Smart defaults - Respects .gitignore, skips binary
-
Better output - Colors, grouping, line numbers by default
-
PCRE2 - Full regex support
Basic Usage
# Simple search (recursive by default)
rg 'pattern'
# Fixed string
rg -F '192.168.1.1'
# Case insensitive
rg -i 'error'
# Whole word
rg -w 'log'
# Show only matches
rg -o '\d+\.\d+\.\d+\.\d+'
File Filtering
# By extension
rg -g '*.py' 'import'
# Exclude pattern
rg -g '!*.log' 'pattern'
# By file type
rg -t py 'import'
rg -t rust 'fn main'
# List supported types
rg --type-list
Advanced Features
# PCRE2 features
rg -P '(?<=port=)\d+'
# Multiline
rg -U 'start.*\n.*end'
# Replace (preview)
rg 'old' --replace 'new'
# JSON output (for scripting)
rg --json 'pattern'
# Statistics
rg --stats 'pattern'
sed
Stream editor for transformations. Master substitution.
Basic Substitution
# Replace first occurrence per line
sed 's/old/new/' file.txt
# Replace all occurrences per line
sed 's/old/new/g' file.txt
# Case insensitive
sed 's/old/new/gi' file.txt
# In-place edit (with backup)
sed -i.bak 's/old/new/g' file.txt
# In-place edit (no backup)
sed -i 's/old/new/g' file.txt
ERE Mode
# Use -E for modern syntax
sed -E 's/[0-9]+/NUMBER/g' file.txt
# Grouping and backreferences
sed -E 's/(\w+) (\w+)/\2 \1/g' file.txt
# Multiple substitutions
sed -E 's/foo/bar/g; s/baz/qux/g' file.txt
Line Selection
# Specific line
sed -n '5p' file.txt
# Line range
sed -n '5,10p' file.txt
# Pattern match
sed -n '/ERROR/p' file.txt
# Delete lines
sed '/^#/d' file.txt # Delete comments
sed '/^$/d' file.txt # Delete empty lines
# Insert/append
sed '5i\New line before' file.txt
sed '5a\New line after' file.txt
Practical Examples
# Remove trailing whitespace
sed 's/[[:space:]]*$//' file.txt
# Convert DOS to Unix line endings
sed 's/\r$//' file.txt
# Extract between patterns
sed -n '/START/,/END/p' file.txt
# Comment out lines
sed '/pattern/s/^/#/' file.txt
# Uncomment lines
sed 's/^#//' file.txt
awk
Pattern-action language for field-based processing.
Basic Structure
awk '/pattern/ { action }' file.txt
awk '{ print $1 }' file.txt # First field of all lines
awk '/ERROR/ { print $0 }' log.txt # Lines matching ERROR
Field Processing
# Default separator (whitespace)
awk '{ print $1, $3 }' file.txt
# Custom separator
awk -F: '{ print $1 }' /etc/passwd
awk -F',' '{ print $2 }' data.csv
# Multiple separators
awk -F'[,;:]' '{ print $1 }' file.txt
Regex in awk
# Line match
awk '/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/' access.log
# Field match
awk '$1 ~ /^192\.168/' access.log
# Negated match
awk '$1 !~ /^192\.168/' access.log
# Match and extract
awk 'match($0, /port=([0-9]+)/, a) { print a[1] }' config.txt
Built-in Variables
| Variable | Meaning |
|---|---|
|
Record (line) number |
|
Number of fields |
|
Entire line |
|
Fields |
|
Field separator |
|
Output field separator |
|
Current file name |
Practical Examples
# Sum a column
awk '{ sum += $2 } END { print sum }' data.txt
# Average
awk '{ sum += $2; count++ } END { print sum/count }' data.txt
# Unique values
awk '{ seen[$1]++ } END { for (k in seen) print k }' file.txt
# Group by and count
awk '{ count[$1]++ } END { for (k in count) print k, count[k] }' access.log
# Line range
awk 'NR>=10 && NR<=20' file.txt
# Conditional processing
awk '$3 > 100 { print $1, $3 }' data.txt
Python re Module
Full-featured regex with named groups, lookaround, and clear API.
Basic Operations
import re
text = "Error on 2026-03-15: Connection refused from 192.168.1.100"
# Search (first match)
match = re.search(r'\d{1,3}(\.\d{1,3}){3}', text)
if match:
print(match.group()) # 192.168.1.100
# Find all
ips = re.findall(r'\d{1,3}(\.\d{1,3}){3}', text)
# Returns list of matches
# Find all with groups
dates = re.findall(r'(\d{4})-(\d{2})-(\d{2})', text)
# Returns list of tuples
# Substitute
result = re.sub(r'\d+', 'X', text)
Compiled Patterns
import re
# Compile for reuse (performance)
ip_pattern = re.compile(r'\d{1,3}(\.\d{1,3}){3}')
for line in open('access.log'):
match = ip_pattern.search(line)
if match:
print(match.group())
Named Groups
import re
log_pattern = re.compile(
r'(?P<timestamp>\d{4}-\d{2}-\d{2})\s+'
r'(?P<level>\w+)\s+'
r'(?P<message>.+)'
)
line = "2026-03-15 ERROR Connection refused"
match = log_pattern.search(line)
if match:
print(match.group('timestamp')) # 2026-03-15
print(match.group('level')) # ERROR
print(match.groupdict()) # {'timestamp': '2026-03-15', ...}
Flags
import re
# Case insensitive
re.search(r'error', text, re.IGNORECASE)
# Multiline (^ and $ match line boundaries)
re.findall(r'^ERROR', text, re.MULTILINE)
# Verbose (comments and whitespace)
pattern = re.compile(r'''
(?P<year>\d{4})- # Year
(?P<month>\d{2})- # Month
(?P<day>\d{2}) # Day
''', re.VERBOSE)
# Combined
re.search(r'pattern', text, re.IGNORECASE | re.MULTILINE)
Practical Examples
import re
# Parse log file
def parse_sshd_log(line):
pattern = r'Failed password for (?:invalid user )?(\S+) from (\S+)'
match = re.search(pattern, line)
if match:
return {'user': match.group(1), 'ip': match.group(2)}
return None
# Extract all emails
def find_emails(text):
pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
return re.findall(pattern, text)
# Validate IP
def is_valid_ip(ip):
pattern = r'^(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)$'
return bool(re.match(pattern, ip))
JavaScript RegExp
Browser and Node.js regex with modern ES2018+ features.
Creating Patterns
// Literal (preferred for static patterns)
const pattern1 = /\d+/g;
// Constructor (for dynamic patterns)
const userInput = "error";
const pattern2 = new RegExp(userInput, 'gi');
Methods
const text = "Port: 443, Other: 8080";
// test - boolean
/\d+/.test(text); // true
// match - array of matches
text.match(/\d+/g); // ['443', '8080']
// matchAll - iterator with groups (ES2020)
const matches = text.matchAll(/Port: (\d+)/g);
for (const match of matches) {
console.log(match[1]); // 443
}
// exec - detailed match info
const pattern = /Port: (\d+)/g;
let match;
while ((match = pattern.exec(text)) !== null) {
console.log(match[1]);
}
// replace
text.replace(/\d+/g, 'X'); // "Port: X, Other: X"
// split
'a,b;c'.split(/[,;]/); // ['a', 'b', 'c']
Named Groups (ES2018)
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2026-03-15'.match(pattern);
console.log(match.groups.year); // 2026
console.log(match.groups.month); // 03
console.log(match.groups.day); // 15
// In replace
'2026-03-15'.replace(pattern, '$<month>/$<day>/$<year>');
// "03/15/2026"
Practical Examples
// Parse URL
function parseUrl(url) {
const pattern = /^(?<protocol>https?):\/\/(?<host>[^/:]+)(?::(?<port>\d+))?(?<path>\/[^?]*)?(?:\?(?<query>.+))?$/;
const match = url.match(pattern);
return match ? match.groups : null;
}
// Extract IPs from text
function findIPs(text) {
const pattern = /\b\d{1,3}(?:\.\d{1,3}){3}\b/g;
return text.match(pattern) || [];
}
// Validate email
function isValidEmail(email) {
const pattern = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
return pattern.test(email);
}
// Sanitize input
function escapeHtml(text) {
return text.replace(/[&<>"']/g, char => ({
'&': '&',
'<': '<',
'>': '>',
'"': '"',
"'": '''
})[char]);
}
Tool Selection Guide
| Task | Best Tool | Why |
|---|---|---|
Quick search |
|
Fast, simple |
Extract specific field |
|
Field-based processing |
Transform text |
|
Stream editing |
Complex parsing |
Python |
Full programming language |
Browser scripting |
JavaScript |
Native environment |
Large files |
|
Parallel, memory efficient |
JSON data |
|
Purpose-built |
Combining Tools
# grep -> awk
grep 'ERROR' log.txt | awk '{print $4}'
# grep -> sed
grep 'server=' config.txt | sed 's/server=//'
# awk with regex -> sort -> uniq
awk '/Failed password/ {print $11}' auth.log | sort | uniq -c | sort -rn
# ripgrep -> Python processing
rg -oN '\d+\.\d+\.\d+\.\d+' access.log | python3 -c "
import sys
from collections import Counter
ips = Counter(line.strip() for line in sys.stdin)
for ip, count in ips.most_common(10):
print(f'{count:6d} {ip}')
"
Self-Test Exercises
| Try each challenge FIRST. Only expand the answer after you’ve attempted it. |
Setup Test Data
cat << 'EOF' > /tmp/tools.txt
192.168.1.100 - - [15/Mar/2026:14:30:45 +0000] "GET /api/users HTTP/1.1" 200 1234
10.50.1.20 - - [15/Mar/2026:14:30:46 +0000] "POST /api/login HTTP/1.1" 401 89
192.168.1.100 - - [15/Mar/2026:14:30:47 +0000] "GET /api/config HTTP/1.1" 500 456
172.16.0.5 - - [15/Mar/2026:14:30:48 +0000] "GET /health HTTP/1.1" 200 15
server=prod-01
port=8080
timeout=30
debug=true
Mar 15 14:30:45 server sshd[12345]: Failed password for root from 10.0.0.1
Mar 15 14:30:46 server sshd[12346]: Accepted publickey for admin from 192.168.1.50
Mar 15 14:30:47 server sshd[12347]: Failed password for invalid user test from 10.0.0.2
EOF
cat << 'EOF' > /tmp/api-response.json
{
"status": "success",
"data": {
"users": [
{"id": 1, "name": "admin", "email": "admin@example.com", "role": "admin"},
{"id": 2, "name": "evan", "email": "evan@domusdigitalis.dev", "role": "user"},
{"id": 3, "name": "test", "email": "test@example.org", "role": "user"}
],
"pagination": {
"page": 1,
"total": 3,
"per_page": 10
}
},
"errors": []
}
EOF
cat << 'EOF' > /tmp/network.json
{
"interfaces": [
{"name": "eth0", "ip": "192.168.1.10", "mac": "AA:BB:CC:DD:EE:FF", "status": "up"},
{"name": "eth1", "ip": "10.50.1.20", "mac": "11:22:33:44:55:66", "status": "down"},
{"name": "lo", "ip": "127.0.0.1", "mac": "00:00:00:00:00:00", "status": "up"}
],
"routes": [
{"dest": "0.0.0.0/0", "gateway": "192.168.1.1", "interface": "eth0"},
{"dest": "10.50.0.0/16", "gateway": "10.50.1.1", "interface": "eth1"}
]
}
EOF
Challenge 1: grep - Extract IPs from Log
Goal: Extract all IP addresses from the access log
Answer
grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' /tmp/tools.txt
Or with PCRE: grep -oP '\d+\.\d+\.\d+\.\d+' /tmp/tools.txt
Challenge 2: awk - First Field (IPs)
Goal: Use awk to extract the first field (client IP) from access log lines
Answer
awk '/HTTP/ {print $1}' /tmp/tools.txt
$1 is the first field (IP). Pattern /HTTP/ filters to HTTP log lines.
Challenge 3: awk - Count by Status Code
Goal: Count requests by HTTP status code
Answer
awk '/HTTP/ {print $9}' /tmp/tools.txt | sort | uniq -c
$9 is the status code in combined log format.
Challenge 4: sed - Extract Config Value
Goal: Extract just the value after "port=" using sed
Answer
sed -n 's/port=\(.*\)/\1/p' /tmp/tools.txt
# Or with ERE:
sed -nE 's/port=(.*)/\1/p' /tmp/tools.txt
-n suppresses default output, /p prints matches.
Challenge 5: sed - Replace IP with REDACTED
Goal: Replace all IPs with [REDACTED]
Answer
sed -E 's/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/[REDACTED]/g' /tmp/tools.txt
/g for global replacement on each line.
Challenge 6: awk - Failed SSH Logins
Goal: Extract IPs from failed SSH login attempts
Answer
awk '/Failed password/ {print $(NF-3)}' /tmp/tools.txt
NF is number of fields. IP is 4th from end in sshd log format.
Challenge 7: jq - Extract User Names
Goal: Get all user names from the API response
Answer
jq -r '.data.users[].name' /tmp/api-response.json
Output: admin, evan, test
Challenge 8: jq - Filter by Role
Goal: Get emails of users with role "user" (not admin)
Answer
jq -r '.data.users[] | select(.role == "user") | .email' /tmp/api-response.json
Output: evan@domusdigitalis.dev, test@example.org
Challenge 9: jq - Extract Active Interfaces
Goal: Get names of interfaces with status "up"
Answer
jq -r '.interfaces[] | select(.status == "up") | .name' /tmp/network.json
Output: eth0, lo
Challenge 10: jq - Build Network Inventory
Goal: Create CSV output: name,ip,mac for all interfaces
Answer
jq -r '.interfaces[] | [.name, .ip, .mac] | @csv' /tmp/network.json
Output: "eth0","192.168.1.10","AA:BB:CC:DD:EE:FF" etc.
Challenge 11: jq + grep Pipeline
Goal: Find users with example.com domain
Answer
jq -r '.data.users[].email' /tmp/api-response.json | grep 'example\.com'
Or pure jq:
jq -r '.data.users[] | select(.email | test("example\\.com")) | .email' /tmp/api-response.json
Challenge 12: ripgrep - Fast Search
Goal: Find all occurrences of "192.168" using ripgrep
Answer
rg '192\.168' /tmp/tools.txt /tmp/network.json
ripgrep searches multiple files, respects .gitignore.
Challenge 13: awk + sort Pipeline
Goal: Get unique IPs sorted by frequency
Answer
grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' /tmp/tools.txt | sort | uniq -c | sort -rn
Classic pipeline: extract → sort → count → sort by count.
Challenge 14: jq - Modify and Output
Goal: Add a "checked" field to each user
Answer
jq '.data.users[] += {"checked": true}' /tmp/api-response.json
+= adds new field to each user object.
Challenge 15: jq - Access Nested Values
Goal: Get the default gateway from routes
Answer
jq -r '.routes[] | select(.dest == "0.0.0.0/0") | .gateway' /tmp/network.json
Filter to default route (0.0.0.0/0), extract gateway.
Challenge 16: sed - In-Place Config Update
Goal: Change port value from 8080 to 9090 in a config line
Answer
# Preview (no -i)
sed 's/port=8080/port=9090/' /tmp/tools.txt
# In-place with backup
sed -i.bak 's/port=8080/port=9090/' /tmp/tools.txt
Always preview before -i. Use .bak for safety.
Challenge 17: Python - Parse and Extract
Goal: Use Python to extract all unique IPs from the log
Answer
import re
with open('/tmp/tools.txt') as f:
content = f.read()
ips = set(re.findall(r'\d+\.\d+\.\d+\.\d+', content))
for ip in sorted(ips):
print(ip)
set() for unique values, sorted() for order.
Challenge 18: jq - Count by Field
Goal: Count users by role
Answer
jq -r '.data.users | group_by(.role) | map({role: .[0].role, count: length})' /tmp/api-response.json
group_by groups, length counts each group.
Challenge 19: awk - Log Analysis
Goal: Calculate average response size (last field in HTTP logs)
Answer
awk '/HTTP/ {sum += $NF; count++} END {print "Average:", sum/count}' /tmp/tools.txt
$NF is last field (response size). Calculate average in END block.
Challenge 20: Full Pipeline - API to Report
Goal: Generate a report of admin users from API JSON
Answer
jq -r '.data.users[] | select(.role == "admin") | "Admin: \(.name) <\(.email)>"' /tmp/api-response.json
Output: Admin: admin <admin@example.com>
String interpolation with \(…) for formatted output.
Key Takeaways
-
Use the right tool for the job - grep for search, awk for fields, sed for transform
-
ripgrep is faster - prefer
rgfor large codebases -
PCRE gives most features - use
grep -Pwhen needed -
Compile patterns in loops - Python and JavaScript benefit from reuse
-
Named groups improve readability - use them for complex patterns
-
Combine tools in pipelines - each tool does one thing well
-
jq for JSON - don’t use regex for structured JSON data
Practice Path
-
Master
grep -Eandgrep -Pon log files -
Learn
awkfor field extraction -
Use
sedfor batch replacements -
Build Python scripts for complex parsing
-
Apply JavaScript regex in web projects
-
Use
jqfor all JSON operations