Regex Reference - Pattern Matching Mastery
Master regex across all Linux tools.
Quick Reference
Basic Regex (BRE) - grep, sed
| Pattern | Meaning | Example |
|---|---|---|
|
Any single character |
|
|
Zero or more of preceding |
|
|
Start of line |
|
|
End of line |
|
|
Character class |
|
|
Negated class |
|
|
Escape special char |
|
Extended Regex (ERE) - grep -E, egrep
| Pattern | Meaning | Example |
|---|---|---|
|
One or more |
|
|
Zero or one |
|
|
Exactly n times |
|
|
Between n and m |
|
|
Alternation (or) |
|
|
Grouping |
|
Perl-Compatible Regex (PCRE) - grep -P, perl
| Pattern | Meaning | Example |
|---|---|---|
|
Digit [0-9] |
|
|
Non-digit |
|
|
Word char [a-zA-Z0-9_] |
|
|
Non-word char |
|
|
Whitespace |
|
|
Non-whitespace |
|
|
Word boundary |
|
|
Non-capturing group |
|
|
Positive lookahead |
|
|
Negative lookahead |
|
Shell Globbing
Common Patterns
Tool-Specific Usage
grep
# Basic regex
grep 'pattern' file
# Extended regex
grep -E 'pattern|other' file
# Perl regex
grep -P '\d+\.\d+\.\d+\.\d+' file
# Case insensitive
grep -i 'error' file
# Invert match
grep -v 'exclude' file
# Only matching part
grep -o 'pattern' file
sed
# Basic substitution
sed 's/old/new/' file
# Global substitution
sed 's/old/new/g' file
# Extended regex
sed -E 's/(group)/\1_suffix/g' file
# In-place edit
sed -i 's/old/new/g' file
# Delete lines matching pattern
sed '/pattern/d' file
Perl Rename (prename)
# Install
sudo pacman -S perl-rename # Arch
sudo apt install rename # Debian
# Lowercase all filenames
prename 'y/A-Z/a-z/' *
# Replace spaces with underscores
prename 's/ /_/g' *
# Add prefix
prename 's/^/prefix_/' *.txt
# Remove extension
prename 's/\.bak$//' *.bak
# Sequential numbering
prename 's/^/sprintf("%03d_", ++$n)/e' *.jpg
# Dry run (show changes without doing them)
prename -n 's/old/new/' *
Escape Characters
| Character | Escape | When |
|---|---|---|
|
|
Match literal dot |
|
|
Match literal asterisk |
|
|
Match literal question mark |
|
|
Match literal bracket |
|
|
Match literal paren (BRE needs escape) |
|
|
Match literal brace (BRE needs escape) |
|
|
Match literal pipe |
|
|
Match literal dollar |
|
|
Match literal caret (mid-pattern) |
Yoda Tier - Advanced Pattern Mastery
"Do or do not. There is no `.*`"
Lookahead & Lookbehind (Zero-Width Assertions)
# Password must have uppercase, lowercase, digit, special char
grep -P '^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*]).{12,}$' passwords.txt
# Extract values NOT preceded by "test_" or "debug_"
grep -oP '(?<!test_)(?<!debug_)\b\w+_id\b' config.yaml
# Match "password" only if followed by "=" but don't capture "="
grep -oP 'password(?==)' file
# Find functions that DON'T have error handling
grep -P 'def \w+\([^)]*\):(?!.*try:)' *.py
Named Capture Groups
# Parse Apache logs with named groups
perl -ne 'print "$+{ip} $+{status}\n" if /(?<ip>\d+\.\d+\.\d+\.\d+).*?"(?<method>\w+) (?<path>[^"]+)" (?<status>\d+)/' access.log
# Extract structured data from messy logs
grep -oP '(?<timestamp>\d{4}-\d{2}-\d{2}T[\d:]+).*?(?<level>ERROR|WARN|INFO).*?(?<msg>[^|]+)' app.log
# Reuse named groups with backreference
grep -P '(?<quote>["\']).*?\k<quote>' file # Match quoted strings
Recursive Patterns (Balanced Matching)
# Match balanced parentheses (PCRE magic)
grep -oP '\((?:[^()]+|(?R))*\)' code.c
# Match balanced braces in JSON/code
grep -oP '\{(?:[^{}]+|(?R))*\}' file.json
# Extract nested function calls
perl -ne 'print "$&\n" while /\w+\((?:[^()]+|(?R))*\)/g' code.py
Atomic Groups & Possessive Quantifiers
# Prevent catastrophic backtracking (atomic group)
grep -P '(?>\d+)\.' file # Faster than \d+\.
# Possessive quantifier (never backtrack)
grep -P '\d++\.\d++' file # Match decimals efficiently
# Parse huge logs without regex DoS
grep -P '^\S++\s++\S++\s++\[.*?\]' huge.log
Conditional Patterns
# Match different formats based on condition
# If group 1 matched "(", expect ")", else expect nothing
grep -P '(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}' phones.txt
# Matches: (555)123-4567 or 555-123-4567
# Match protocol-specific patterns
grep -P '(?<proto>https?)://(?(<proto>)[\w.-]+(?::\d+)?|[\w.-]+)' urls.txt
Multi-line Sorcery
# Match across lines (slurp mode)
perl -0777 -ne 'print "$&\n" while /function\s+\w+\s*\{[^}]*\}/gs' code.js
# Extract multi-line SQL queries
perl -0777 -pe 's/--.*$//gm; s/\n/ /g' queries.sql | grep -oP 'SELECT.*?;'
# Find multi-line function definitions
grep -Pzo '(?s)def \w+\([^)]*\):.*?(?=\ndef |\Z)' script.py
# Match heredocs
grep -Pzo "(?s)<<'?(\w+)'?.*?^\1$" script.sh
Security & Forensics Patterns
# Detect potential SQL injection attempts
grep -iP "(union\s+(all\s+)?select|or\s+1\s*=\s*1|'\s*or\s*'|;\s*drop\s+table)" access.log
# Find hardcoded secrets (API keys, tokens)
grep -rP '(?i)(api[_-]?key|secret|token|password)\s*[:=]\s*["\047]?[a-zA-Z0-9_\-]{20,}' --include="*.{py,js,yaml,json,env}"
# Detect base64-encoded payloads
grep -oP '[A-Za-z0-9+/]{40,}={0,2}' file | while read b; do echo "$b" | base64 -d 2>/dev/null; done
# Find private keys in files
grep -rP '-----BEGIN (RSA |DSA |EC |OPENSSH )?PRIVATE KEY-----' .
# Detect command injection patterns
grep -P '\$\(.*\)|\`.*\`|;\s*(cat|curl|wget|nc|bash|sh|python|perl|ruby)' input.log
# Find JWT tokens
grep -oP 'eyJ[A-Za-z0-9_-]*\.eyJ[A-Za-z0-9_-]*\.[A-Za-z0-9_-]*' file
Log Parsing Dark Arts
# Parse nginx logs into CSV
perl -pe 's/^(\S+) - - \[(.*?)\] "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)"$/$1,$2,$3,$4,$5/' access.log
# Extract failed SSH attempts with username
grep -oP 'Failed password for (?:invalid user )?(?<user>\S+) from (?<ip>\S+)' /var/log/auth.log
# Parse systemd journal for service crashes
journalctl -p err --no-pager | grep -oP '(?<=\]: ).*(?=\.)' | sort | uniq -c | sort -rn
# Extract timing data from logs
grep -oP 'completed in (?<time>\d+\.?\d*)\s*(?<unit>ms|s|seconds?)' app.log
# Find log entries within time window
awk '/2024-01-15T10:00/,/2024-01-15T11:00/' app.log
Data Extraction Wizardry
# Extract all unique domains from text
grep -oP '(?:https?://)?(?:www\.)?(?<domain>[\w-]+(?:\.[\w-]+)+)' file | sort -u
# Parse JSON without jq (desperate times)
grep -oP '"name"\s*:\s*"\K[^"]+' data.json
# Extract markdown links: [text](url)
grep -oP '\[([^\]]+)\]\(([^)]+)\)' README.md
# Pull all environment variable references
grep -oP '\$\{?\w+\}?|\$[A-Z_][A-Z0-9_]*' script.sh | sort -u
# Extract version numbers (semver)
grep -oP '\bv?\d+\.\d+\.\d+(?:-[\w.]+)?(?:\+[\w.]+)?\b' CHANGELOG.md
One-Liner Legendry
# Validate IPv4 with proper range checking
grep -P '^(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)$' ips.txt
# Strip ANSI color codes
sed 's/\x1b\[[0-9;]*m//g' colored_output.txt
# Convert camelCase to snake_case
sed -E 's/([a-z])([A-Z])/\1_\L\2/g' file
# Remove duplicate lines preserving order (no sort)
awk '!seen[$0]++' file
# Extract code blocks from markdown
perl -0777 -ne 'print "$1\n" while /```\w*\n(.*?)```/gs' README.md
# Replace within matched context only
sed '/BEGIN/,/END/s/old/new/g' file
# Number non-empty lines
awk 'NF {print NR": "$0}' file
# Transpose CSV columns to rows
awk -F',' '{for(i=1;i<=NF;i++) a[NR,i]=$i} END {for(j=1;j<=NF;j++) {for(i=1;i<=NR;i++) printf "%s%s", a[i,j], (i==NR?ORS:OFS)}}' file.csv
The Forbidden Techniques
# Self-matching pattern (match pattern that matches itself)
grep -P '(?R)?' file # Matches empty string infinitely... DON'T
# Regex quine (outputs itself) - for the curious only
perl -e '$_=q{$_=q{0};s/0/$_/;print};s/0/$_/;print'
# Match prime-length strings (yes, really)
perl -ne 'print if /^(?!(..+)\1+$)..+$/'
# Matches strings whose length is a prime number
Performance Tips
| Technique | Why It Matters |
|---|---|
Anchor patterns |
|
Character classes over alternation |
|
Possessive quantifiers |
|
Atomic groups |
|
Avoid |
Use |
Use |
|
# The \K trick: reset match start
echo "error: something bad" | grep -oP 'error: \K.*'
# Output: something bad (without "error: ")
# Match between delimiters efficiently
grep -oP '<<\K[^>]+(?=>>)' file # Extract <<content>>
See Also
-
Advanced Search - Full search mastery
-
Perl Rename - Batch file renaming
-
sed Deep Dive - Stream editing
-
Defensive Patterns - Production shell scripting