Tool Integration

Regex is a universal skill - the same patterns work across tools with minor syntax adjustments. This guide covers practical usage in the tools you’ll use daily for infrastructure work.

grep

The workhorse of text searching. Know when to use each mode.

Modes

grep 'pattern'      # BRE (basic, limited features)
grep -E 'pattern'   # ERE (extended, modern syntax)
grep -P 'pattern'   # PCRE (Perl, full features)
grep -F 'string'    # Fixed string (no regex, fastest)

Essential Flags

Flag Purpose

-i

Case insensitive

-v

Invert match (lines NOT matching)

-o

Output only matched part

-n

Show line numbers

-c

Count matches

-l

List files with matches

-L

List files without matches

-r

Recursive search

-w

Whole word only

-A n

Show n lines after match

-B n

Show n lines before match

-C n

Show n lines of context

Common Patterns

# Find IPs in log
grep -oP '\d{1,3}(\.\d{1,3}){3}' access.log

# Find error lines with context
grep -B2 -A2 'ERROR' application.log

# Recursive search, show files only
grep -rl 'password' .

# Count matches per file
grep -rc 'TODO' src/

# Extract value after key
grep -oP 'port=\K\d+' config.txt

# Case-insensitive whole word
grep -iw 'error' log.txt

Performance Tips

# Fixed string is fastest
grep -F '192.168.1.1' log.txt

# Use --include to filter files
grep -r --include='*.py' 'import' src/

# Use --exclude for speed
grep -r --exclude='*.log' 'pattern' .

# Limit output
grep -m 10 'pattern' file.txt  # First 10 matches

ripgrep (rg)

Modern grep replacement - faster, smarter defaults, better UX.

Why ripgrep

  • Faster - Rust-based, parallel processing

  • Smart defaults - Respects .gitignore, skips binary

  • Better output - Colors, grouping, line numbers by default

  • PCRE2 - Full regex support

Basic Usage

# Simple search (recursive by default)
rg 'pattern'

# Fixed string
rg -F '192.168.1.1'

# Case insensitive
rg -i 'error'

# Whole word
rg -w 'log'

# Show only matches
rg -o '\d+\.\d+\.\d+\.\d+'

File Filtering

# By extension
rg -g '*.py' 'import'

# Exclude pattern
rg -g '!*.log' 'pattern'

# By file type
rg -t py 'import'
rg -t rust 'fn main'

# List supported types
rg --type-list

Advanced Features

# PCRE2 features
rg -P '(?<=port=)\d+'

# Multiline
rg -U 'start.*\n.*end'

# Replace (preview)
rg 'old' --replace 'new'

# JSON output (for scripting)
rg --json 'pattern'

# Statistics
rg --stats 'pattern'

sed

Stream editor for transformations. Master substitution.

Basic Substitution

# Replace first occurrence per line
sed 's/old/new/' file.txt

# Replace all occurrences per line
sed 's/old/new/g' file.txt

# Case insensitive
sed 's/old/new/gi' file.txt

# In-place edit (with backup)
sed -i.bak 's/old/new/g' file.txt

# In-place edit (no backup)
sed -i 's/old/new/g' file.txt

ERE Mode

# Use -E for modern syntax
sed -E 's/[0-9]+/NUMBER/g' file.txt

# Grouping and backreferences
sed -E 's/(\w+) (\w+)/\2 \1/g' file.txt

# Multiple substitutions
sed -E 's/foo/bar/g; s/baz/qux/g' file.txt

Line Selection

# Specific line
sed -n '5p' file.txt

# Line range
sed -n '5,10p' file.txt

# Pattern match
sed -n '/ERROR/p' file.txt

# Delete lines
sed '/^#/d' file.txt        # Delete comments
sed '/^$/d' file.txt        # Delete empty lines

# Insert/append
sed '5i\New line before' file.txt
sed '5a\New line after' file.txt

Practical Examples

# Remove trailing whitespace
sed 's/[[:space:]]*$//' file.txt

# Convert DOS to Unix line endings
sed 's/\r$//' file.txt

# Extract between patterns
sed -n '/START/,/END/p' file.txt

# Comment out lines
sed '/pattern/s/^/#/' file.txt

# Uncomment lines
sed 's/^#//' file.txt

awk

Pattern-action language for field-based processing.

Basic Structure

awk '/pattern/ { action }' file.txt
awk '{ print $1 }' file.txt           # First field of all lines
awk '/ERROR/ { print $0 }' log.txt    # Lines matching ERROR

Field Processing

# Default separator (whitespace)
awk '{ print $1, $3 }' file.txt

# Custom separator
awk -F: '{ print $1 }' /etc/passwd
awk -F',' '{ print $2 }' data.csv

# Multiple separators
awk -F'[,;:]' '{ print $1 }' file.txt

Regex in awk

# Line match
awk '/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/' access.log

# Field match
awk '$1 ~ /^192\.168/' access.log

# Negated match
awk '$1 !~ /^192\.168/' access.log

# Match and extract
awk 'match($0, /port=([0-9]+)/, a) { print a[1] }' config.txt

Built-in Variables

Variable Meaning

NR

Record (line) number

NF

Number of fields

$0

Entire line

$1, $2, …​

Fields

FS

Field separator

OFS

Output field separator

FILENAME

Current file name

Practical Examples

# Sum a column
awk '{ sum += $2 } END { print sum }' data.txt

# Average
awk '{ sum += $2; count++ } END { print sum/count }' data.txt

# Unique values
awk '{ seen[$1]++ } END { for (k in seen) print k }' file.txt

# Group by and count
awk '{ count[$1]++ } END { for (k in count) print k, count[k] }' access.log

# Line range
awk 'NR>=10 && NR<=20' file.txt

# Conditional processing
awk '$3 > 100 { print $1, $3 }' data.txt

Python re Module

Full-featured regex with named groups, lookaround, and clear API.

Basic Operations

import re

text = "Error on 2026-03-15: Connection refused from 192.168.1.100"

# Search (first match)
match = re.search(r'\d{1,3}(\.\d{1,3}){3}', text)
if match:
    print(match.group())  # 192.168.1.100

# Find all
ips = re.findall(r'\d{1,3}(\.\d{1,3}){3}', text)
# Returns list of matches

# Find all with groups
dates = re.findall(r'(\d{4})-(\d{2})-(\d{2})', text)
# Returns list of tuples

# Substitute
result = re.sub(r'\d+', 'X', text)

Compiled Patterns

import re

# Compile for reuse (performance)
ip_pattern = re.compile(r'\d{1,3}(\.\d{1,3}){3}')

for line in open('access.log'):
    match = ip_pattern.search(line)
    if match:
        print(match.group())

Named Groups

import re

log_pattern = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2})\s+'
    r'(?P<level>\w+)\s+'
    r'(?P<message>.+)'
)

line = "2026-03-15 ERROR Connection refused"
match = log_pattern.search(line)

if match:
    print(match.group('timestamp'))  # 2026-03-15
    print(match.group('level'))      # ERROR
    print(match.groupdict())         # {'timestamp': '2026-03-15', ...}

Flags

import re

# Case insensitive
re.search(r'error', text, re.IGNORECASE)

# Multiline (^ and $ match line boundaries)
re.findall(r'^ERROR', text, re.MULTILINE)

# Verbose (comments and whitespace)
pattern = re.compile(r'''
    (?P<year>\d{4})-    # Year
    (?P<month>\d{2})-   # Month
    (?P<day>\d{2})      # Day
''', re.VERBOSE)

# Combined
re.search(r'pattern', text, re.IGNORECASE | re.MULTILINE)

Practical Examples

import re

# Parse log file
def parse_sshd_log(line):
    pattern = r'Failed password for (?:invalid user )?(\S+) from (\S+)'
    match = re.search(pattern, line)
    if match:
        return {'user': match.group(1), 'ip': match.group(2)}
    return None

# Extract all emails
def find_emails(text):
    pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
    return re.findall(pattern, text)

# Validate IP
def is_valid_ip(ip):
    pattern = r'^(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)$'
    return bool(re.match(pattern, ip))

JavaScript RegExp

Browser and Node.js regex with modern ES2018+ features.

Creating Patterns

// Literal (preferred for static patterns)
const pattern1 = /\d+/g;

// Constructor (for dynamic patterns)
const userInput = "error";
const pattern2 = new RegExp(userInput, 'gi');

Methods

const text = "Port: 443, Other: 8080";

// test - boolean
/\d+/.test(text);  // true

// match - array of matches
text.match(/\d+/g);  // ['443', '8080']

// matchAll - iterator with groups (ES2020)
const matches = text.matchAll(/Port: (\d+)/g);
for (const match of matches) {
    console.log(match[1]);  // 443
}

// exec - detailed match info
const pattern = /Port: (\d+)/g;
let match;
while ((match = pattern.exec(text)) !== null) {
    console.log(match[1]);
}

// replace
text.replace(/\d+/g, 'X');  // "Port: X, Other: X"

// split
'a,b;c'.split(/[,;]/);  // ['a', 'b', 'c']

Named Groups (ES2018)

const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2026-03-15'.match(pattern);

console.log(match.groups.year);   // 2026
console.log(match.groups.month);  // 03
console.log(match.groups.day);    // 15

// In replace
'2026-03-15'.replace(pattern, '$<month>/$<day>/$<year>');
// "03/15/2026"

Practical Examples

// Parse URL
function parseUrl(url) {
    const pattern = /^(?<protocol>https?):\/\/(?<host>[^/:]+)(?::(?<port>\d+))?(?<path>\/[^?]*)?(?:\?(?<query>.+))?$/;
    const match = url.match(pattern);
    return match ? match.groups : null;
}

// Extract IPs from text
function findIPs(text) {
    const pattern = /\b\d{1,3}(?:\.\d{1,3}){3}\b/g;
    return text.match(pattern) || [];
}

// Validate email
function isValidEmail(email) {
    const pattern = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
    return pattern.test(email);
}

// Sanitize input
function escapeHtml(text) {
    return text.replace(/[&<>"']/g, char => ({
        '&': '&amp;',
        '<': '&lt;',
        '>': '&gt;',
        '"': '&quot;',
        "'": '&#39;'
    })[char]);
}

Tool Selection Guide

Task Best Tool Why

Quick search

rg or grep

Fast, simple

Extract specific field

awk

Field-based processing

Transform text

sed

Stream editing

Complex parsing

Python

Full programming language

Browser scripting

JavaScript

Native environment

Large files

rg

Parallel, memory efficient

JSON data

jq (not regex)

Purpose-built

Combining Tools

# grep -> awk
grep 'ERROR' log.txt | awk '{print $4}'

# grep -> sed
grep 'server=' config.txt | sed 's/server=//'

# awk with regex -> sort -> uniq
awk '/Failed password/ {print $11}' auth.log | sort | uniq -c | sort -rn

# ripgrep -> Python processing
rg -oN '\d+\.\d+\.\d+\.\d+' access.log | python3 -c "
import sys
from collections import Counter
ips = Counter(line.strip() for line in sys.stdin)
for ip, count in ips.most_common(10):
    print(f'{count:6d} {ip}')
"

Self-Test Exercises

Try each challenge FIRST. Only expand the answer after you’ve attempted it.

Setup Test Data

cat << 'EOF' > /tmp/tools.txt
192.168.1.100 - - [15/Mar/2026:14:30:45 +0000] "GET /api/users HTTP/1.1" 200 1234
10.50.1.20 - - [15/Mar/2026:14:30:46 +0000] "POST /api/login HTTP/1.1" 401 89
192.168.1.100 - - [15/Mar/2026:14:30:47 +0000] "GET /api/config HTTP/1.1" 500 456
172.16.0.5 - - [15/Mar/2026:14:30:48 +0000] "GET /health HTTP/1.1" 200 15
server=prod-01
port=8080
timeout=30
debug=true
Mar 15 14:30:45 server sshd[12345]: Failed password for root from 10.0.0.1
Mar 15 14:30:46 server sshd[12346]: Accepted publickey for admin from 192.168.1.50
Mar 15 14:30:47 server sshd[12347]: Failed password for invalid user test from 10.0.0.2
EOF
cat << 'EOF' > /tmp/api-response.json
{
  "status": "success",
  "data": {
    "users": [
      {"id": 1, "name": "admin", "email": "admin@example.com", "role": "admin"},
      {"id": 2, "name": "evan", "email": "evan@domusdigitalis.dev", "role": "user"},
      {"id": 3, "name": "test", "email": "test@example.org", "role": "user"}
    ],
    "pagination": {
      "page": 1,
      "total": 3,
      "per_page": 10
    }
  },
  "errors": []
}
EOF
cat << 'EOF' > /tmp/network.json
{
  "interfaces": [
    {"name": "eth0", "ip": "192.168.1.10", "mac": "AA:BB:CC:DD:EE:FF", "status": "up"},
    {"name": "eth1", "ip": "10.50.1.20", "mac": "11:22:33:44:55:66", "status": "down"},
    {"name": "lo", "ip": "127.0.0.1", "mac": "00:00:00:00:00:00", "status": "up"}
  ],
  "routes": [
    {"dest": "0.0.0.0/0", "gateway": "192.168.1.1", "interface": "eth0"},
    {"dest": "10.50.0.0/16", "gateway": "10.50.1.1", "interface": "eth1"}
  ]
}
EOF

Challenge 1: grep - Extract IPs from Log

Goal: Extract all IP addresses from the access log

Answer
grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' /tmp/tools.txt

Or with PCRE: grep -oP '\d+\.\d+\.\d+\.\d+' /tmp/tools.txt


Challenge 2: awk - First Field (IPs)

Goal: Use awk to extract the first field (client IP) from access log lines

Answer
awk '/HTTP/ {print $1}' /tmp/tools.txt

$1 is the first field (IP). Pattern /HTTP/ filters to HTTP log lines.


Challenge 3: awk - Count by Status Code

Goal: Count requests by HTTP status code

Answer
awk '/HTTP/ {print $9}' /tmp/tools.txt | sort | uniq -c

$9 is the status code in combined log format.


Challenge 4: sed - Extract Config Value

Goal: Extract just the value after "port=" using sed

Answer
sed -n 's/port=\(.*\)/\1/p' /tmp/tools.txt

# Or with ERE:
sed -nE 's/port=(.*)/\1/p' /tmp/tools.txt

-n suppresses default output, /p prints matches.


Challenge 5: sed - Replace IP with REDACTED

Goal: Replace all IPs with [REDACTED]

Answer
sed -E 's/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/[REDACTED]/g' /tmp/tools.txt

/g for global replacement on each line.


Challenge 6: awk - Failed SSH Logins

Goal: Extract IPs from failed SSH login attempts

Answer
awk '/Failed password/ {print $(NF-3)}' /tmp/tools.txt

NF is number of fields. IP is 4th from end in sshd log format.


Challenge 7: jq - Extract User Names

Goal: Get all user names from the API response

Answer
jq -r '.data.users[].name' /tmp/api-response.json

Output: admin, evan, test


Challenge 8: jq - Filter by Role

Goal: Get emails of users with role "user" (not admin)

Answer
jq -r '.data.users[] | select(.role == "user") | .email' /tmp/api-response.json

Challenge 9: jq - Extract Active Interfaces

Goal: Get names of interfaces with status "up"

Answer
jq -r '.interfaces[] | select(.status == "up") | .name' /tmp/network.json

Output: eth0, lo


Challenge 10: jq - Build Network Inventory

Goal: Create CSV output: name,ip,mac for all interfaces

Answer
jq -r '.interfaces[] | [.name, .ip, .mac] | @csv' /tmp/network.json

Output: "eth0","192.168.1.10","AA:BB:CC:DD:EE:FF" etc.


Challenge 11: jq + grep Pipeline

Goal: Find users with example.com domain

Answer
jq -r '.data.users[].email' /tmp/api-response.json | grep 'example\.com'

Or pure jq:

jq -r '.data.users[] | select(.email | test("example\\.com")) | .email' /tmp/api-response.json

Goal: Find all occurrences of "192.168" using ripgrep

Answer
rg '192\.168' /tmp/tools.txt /tmp/network.json

ripgrep searches multiple files, respects .gitignore.


Challenge 13: awk + sort Pipeline

Goal: Get unique IPs sorted by frequency

Answer
grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' /tmp/tools.txt | sort | uniq -c | sort -rn

Classic pipeline: extract → sort → count → sort by count.


Challenge 14: jq - Modify and Output

Goal: Add a "checked" field to each user

Answer
jq '.data.users[] += {"checked": true}' /tmp/api-response.json

+= adds new field to each user object.


Challenge 15: jq - Access Nested Values

Goal: Get the default gateway from routes

Answer
jq -r '.routes[] | select(.dest == "0.0.0.0/0") | .gateway' /tmp/network.json

Filter to default route (0.0.0.0/0), extract gateway.


Challenge 16: sed - In-Place Config Update

Goal: Change port value from 8080 to 9090 in a config line

Answer
# Preview (no -i)
sed 's/port=8080/port=9090/' /tmp/tools.txt

# In-place with backup
sed -i.bak 's/port=8080/port=9090/' /tmp/tools.txt

Always preview before -i. Use .bak for safety.


Challenge 17: Python - Parse and Extract

Goal: Use Python to extract all unique IPs from the log

Answer
import re

with open('/tmp/tools.txt') as f:
    content = f.read()
    ips = set(re.findall(r'\d+\.\d+\.\d+\.\d+', content))
    for ip in sorted(ips):
        print(ip)

set() for unique values, sorted() for order.


Challenge 18: jq - Count by Field

Goal: Count users by role

Answer
jq -r '.data.users | group_by(.role) | map({role: .[0].role, count: length})' /tmp/api-response.json

group_by groups, length counts each group.


Challenge 19: awk - Log Analysis

Goal: Calculate average response size (last field in HTTP logs)

Answer
awk '/HTTP/ {sum += $NF; count++} END {print "Average:", sum/count}' /tmp/tools.txt

$NF is last field (response size). Calculate average in END block.


Challenge 20: Full Pipeline - API to Report

Goal: Generate a report of admin users from API JSON

Answer
jq -r '.data.users[] | select(.role == "admin") | "Admin: \(.name) <\(.email)>"' /tmp/api-response.json

Output: Admin: admin <admin@example.com>

String interpolation with \(…​) for formatted output.

Key Takeaways

  1. Use the right tool for the job - grep for search, awk for fields, sed for transform

  2. ripgrep is faster - prefer rg for large codebases

  3. PCRE gives most features - use grep -P when needed

  4. Compile patterns in loops - Python and JavaScript benefit from reuse

  5. Named groups improve readability - use them for complex patterns

  6. Combine tools in pipelines - each tool does one thing well

  7. jq for JSON - don’t use regex for structured JSON data

Practice Path

  1. Master grep -E and grep -P on log files

  2. Learn awk for field extraction

  3. Use sed for batch replacements

  4. Build Python scripts for complex parsing

  5. Apply JavaScript regex in web projects

  6. Use jq for all JSON operations