Regex Reference - Pattern Matching Mastery

Master regex across all Linux tools.

Quick Reference

Basic Regex (BRE) - grep, sed

Pattern Meaning Example

.

Any single character

a.c matches abc, aXc

*

Zero or more of preceding

ab*c matches ac, abc, abbc

^

Start of line

^Error matches lines starting with Error

$

End of line

\.log$ matches lines ending in .log

[]

Character class

[aeiou] matches any vowel

[^]

Negated class

[^0-9] matches non-digits

\

Escape special char

\. matches literal dot

Extended Regex (ERE) - grep -E, egrep

Pattern Meaning Example

+

One or more

ab+c matches abc, abbc (not ac)

?

Zero or one

colou?r matches color, colour

{n}

Exactly n times

a{3} matches aaa

\{n,m\}

Between n and m

a\{2,4\} matches aa, aaa, aaaa

|

Alternation (or)

cat|dog matches cat or dog

()

Grouping

(ab)+ matches ab, abab

Perl-Compatible Regex (PCRE) - grep -P, perl

Pattern Meaning Example

\d

Digit [0-9]

\d{3} matches 123

\D

Non-digit

\D+ matches abc

\w

Word char [a-zA-Z0-9_]

\w+ matches hello_123

\W

Non-word char

\W matches @, #, space

\s

Whitespace

\s+ matches spaces/tabs

\S

Non-whitespace

\S+ matches words

\b

Word boundary

\bword\b matches whole word

(?:)

Non-capturing group

(?:ab)+ groups without capture

(?=)

Positive lookahead

foo(?=bar) matches foo before bar

(?!)

Negative lookahead

foo(?!bar) matches foo not before bar


Shell Globbing

Basic Globs

Pattern Meaning Example

*

Any characters

*.txt matches all .txt files

?

Single character

file?.txt matches file1.txt

[]

Character class

file[123].txt matches file1.txt

{}

Brace expansion

file.{txt,md} expands to file.txt file.md

Extended Globs (shopt -s extglob)

shopt -s extglob
Pattern Meaning Example

?(pattern)

Zero or one

file?(s).txt

*(pattern)

Zero or more

*([0-9])

+(pattern)

One or more

+([a-z])

@(pattern)

Exactly one

@(foo|bar)

!(pattern)

Not matching

!(*.txt) all except .txt


Common Patterns

IP Addresses

# Basic IP match
grep -E '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' file
# Strict IPv4
grep -P '\b(?:\d{1,3}\.){3}\d{1,3}\b' file

Email Addresses

grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file

MAC Addresses

grep -iE '([0-9a-f]{2}:){5}[0-9a-f]{2}' file

URLs

grep -E 'https?://[^\s]+' file

Phone Numbers

# US format
grep -E '\(?[0-9]{3}\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}' file

Dates

# YYYY-MM-DD
grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}' file
# MM/DD/YYYY
grep -E '[0-9]{2}/[0-9]{2}/[0-9]{4}' file

Log Timestamps

# syslog format
grep -E '^[A-Z][a-z]{2} [0-9 ][0-9] [0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/syslog
# ISO 8601
grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}' file

Tool-Specific Usage

grep

# Basic regex
grep 'pattern' file
# Extended regex
grep -E 'pattern|other' file
# Perl regex
grep -P '\d+\.\d+\.\d+\.\d+' file
# Case insensitive
grep -i 'error' file
# Invert match
grep -v 'exclude' file
# Only matching part
grep -o 'pattern' file

sed

# Basic substitution
sed 's/old/new/' file
# Global substitution
sed 's/old/new/g' file
# Extended regex
sed -E 's/(group)/\1_suffix/g' file
# In-place edit
sed -i 's/old/new/g' file
# Delete lines matching pattern
sed '/pattern/d' file

awk

# Match pattern
awk '/pattern/ {print}' file
# Field matching
awk '$1 ~ /pattern/' file
# Negated match
awk '$1 !~ /pattern/' file
# Regex in condition
awk '/^Error/ {print $0}' file

perl

# One-liner substitution
perl -pe 's/old/new/g' file
# In-place edit
perl -i -pe 's/old/new/g' file
# Complex pattern
perl -ne 'print if /\b\d{3}-\d{4}\b/' file

Perl Rename (prename)

# Install
sudo pacman -S perl-rename    # Arch
sudo apt install rename       # Debian
# Lowercase all filenames
prename 'y/A-Z/a-z/' *
# Replace spaces with underscores
prename 's/ /_/g' *
# Add prefix
prename 's/^/prefix_/' *.txt
# Remove extension
prename 's/\.bak$//' *.bak
# Sequential numbering
prename 's/^/sprintf("%03d_", ++$n)/e' *.jpg
# Dry run (show changes without doing them)
prename -n 's/old/new/' *

Escape Characters

Character Escape When

.

\.

Match literal dot

*

\*

Match literal asterisk

?

\?

Match literal question mark

[

\[

Match literal bracket

(

\(

Match literal paren (BRE needs escape)

{

\{

Match literal brace (BRE needs escape)

|

\|

Match literal pipe

$

\$

Match literal dollar

^

\^

Match literal caret (mid-pattern)


Yoda Tier - Advanced Pattern Mastery

"Do or do not. There is no `.*`"

Lookahead & Lookbehind (Zero-Width Assertions)

# Password must have uppercase, lowercase, digit, special char
grep -P '^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*]).{12,}$' passwords.txt
# Extract values NOT preceded by "test_" or "debug_"
grep -oP '(?<!test_)(?<!debug_)\b\w+_id\b' config.yaml
# Match "password" only if followed by "=" but don't capture "="
grep -oP 'password(?==)' file
# Find functions that DON'T have error handling
grep -P 'def \w+\([^)]*\):(?!.*try:)' *.py

Named Capture Groups

# Parse Apache logs with named groups
perl -ne 'print "$+{ip} $+{status}\n" if /(?<ip>\d+\.\d+\.\d+\.\d+).*?"(?<method>\w+) (?<path>[^"]+)" (?<status>\d+)/' access.log
# Extract structured data from messy logs
grep -oP '(?<timestamp>\d{4}-\d{2}-\d{2}T[\d:]+).*?(?<level>ERROR|WARN|INFO).*?(?<msg>[^|]+)' app.log
# Reuse named groups with backreference
grep -P '(?<quote>["\']).*?\k<quote>' file    # Match quoted strings

Recursive Patterns (Balanced Matching)

# Match balanced parentheses (PCRE magic)
grep -oP '\((?:[^()]+|(?R))*\)' code.c
# Match balanced braces in JSON/code
grep -oP '\{(?:[^{}]+|(?R))*\}' file.json
# Extract nested function calls
perl -ne 'print "$&\n" while /\w+\((?:[^()]+|(?R))*\)/g' code.py

Atomic Groups & Possessive Quantifiers

# Prevent catastrophic backtracking (atomic group)
grep -P '(?>\d+)\.' file    # Faster than \d+\.
# Possessive quantifier (never backtrack)
grep -P '\d++\.\d++' file   # Match decimals efficiently
# Parse huge logs without regex DoS
grep -P '^\S++\s++\S++\s++\[.*?\]' huge.log

Conditional Patterns

# Match different formats based on condition
# If group 1 matched "(", expect ")", else expect nothing
grep -P '(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}' phones.txt
# Matches: (555)123-4567 or 555-123-4567
# Match protocol-specific patterns
grep -P '(?<proto>https?)://(?(<proto>)[\w.-]+(?::\d+)?|[\w.-]+)' urls.txt

Multi-line Sorcery

# Match across lines (slurp mode)
perl -0777 -ne 'print "$&\n" while /function\s+\w+\s*\{[^}]*\}/gs' code.js
# Extract multi-line SQL queries
perl -0777 -pe 's/--.*$//gm; s/\n/ /g' queries.sql | grep -oP 'SELECT.*?;'
# Find multi-line function definitions
grep -Pzo '(?s)def \w+\([^)]*\):.*?(?=\ndef |\Z)' script.py
# Match heredocs
grep -Pzo "(?s)<<'?(\w+)'?.*?^\1$" script.sh

Security & Forensics Patterns

# Detect potential SQL injection attempts
grep -iP "(union\s+(all\s+)?select|or\s+1\s*=\s*1|'\s*or\s*'|;\s*drop\s+table)" access.log
# Find hardcoded secrets (API keys, tokens)
grep -rP '(?i)(api[_-]?key|secret|token|password)\s*[:=]\s*["\047]?[a-zA-Z0-9_\-]{20,}' --include="*.{py,js,yaml,json,env}"
# Detect base64-encoded payloads
grep -oP '[A-Za-z0-9+/]{40,}={0,2}' file | while read b; do echo "$b" | base64 -d 2>/dev/null; done
# Find private keys in files
grep -rP '-----BEGIN (RSA |DSA |EC |OPENSSH )?PRIVATE KEY-----' .
# Detect command injection patterns
grep -P '\$\(.*\)|\`.*\`|;\s*(cat|curl|wget|nc|bash|sh|python|perl|ruby)' input.log
# Find JWT tokens
grep -oP 'eyJ[A-Za-z0-9_-]*\.eyJ[A-Za-z0-9_-]*\.[A-Za-z0-9_-]*' file

Log Parsing Dark Arts

# Parse nginx logs into CSV
perl -pe 's/^(\S+) - - \[(.*?)\] "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)"$/$1,$2,$3,$4,$5/' access.log
# Extract failed SSH attempts with username
grep -oP 'Failed password for (?:invalid user )?(?<user>\S+) from (?<ip>\S+)' /var/log/auth.log
# Parse systemd journal for service crashes
journalctl -p err --no-pager | grep -oP '(?<=\]: ).*(?=\.)' | sort | uniq -c | sort -rn
# Extract timing data from logs
grep -oP 'completed in (?<time>\d+\.?\d*)\s*(?<unit>ms|s|seconds?)' app.log
# Find log entries within time window
awk '/2024-01-15T10:00/,/2024-01-15T11:00/' app.log

Data Extraction Wizardry

# Extract all unique domains from text
grep -oP '(?:https?://)?(?:www\.)?(?<domain>[\w-]+(?:\.[\w-]+)+)' file | sort -u
# Parse JSON without jq (desperate times)
grep -oP '"name"\s*:\s*"\K[^"]+' data.json
# Extract markdown links: [text](url)
grep -oP '\[([^\]]+)\]\(([^)]+)\)' README.md
# Pull all environment variable references
grep -oP '\$\{?\w+\}?|\$[A-Z_][A-Z0-9_]*' script.sh | sort -u
# Extract version numbers (semver)
grep -oP '\bv?\d+\.\d+\.\d+(?:-[\w.]+)?(?:\+[\w.]+)?\b' CHANGELOG.md

One-Liner Legendry

# Validate IPv4 with proper range checking
grep -P '^(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)$' ips.txt
# Strip ANSI color codes
sed 's/\x1b\[[0-9;]*m//g' colored_output.txt
# Convert camelCase to snake_case
sed -E 's/([a-z])([A-Z])/\1_\L\2/g' file
# Remove duplicate lines preserving order (no sort)
awk '!seen[$0]++' file
# Extract code blocks from markdown
perl -0777 -ne 'print "$1\n" while /```\w*\n(.*?)```/gs' README.md
# Replace within matched context only
sed '/BEGIN/,/END/s/old/new/g' file
# Number non-empty lines
awk 'NF {print NR": "$0}' file
# Transpose CSV columns to rows
awk -F',' '{for(i=1;i<=NF;i++) a[NR,i]=$i} END {for(j=1;j<=NF;j++) {for(i=1;i<=NR;i++) printf "%s%s", a[i,j], (i==NR?ORS:OFS)}}' file.csv

The Forbidden Techniques

# Self-matching pattern (match pattern that matches itself)
grep -P '(?R)?' file    # Matches empty string infinitely... DON'T
# Regex quine (outputs itself) - for the curious only
perl -e '$_=q{$_=q{0};s/0/$_/;print};s/0/$_/;print'
# Match prime-length strings (yes, really)
perl -ne 'print if /^(?!(..+)\1+$)..+$/'
# Matches strings whose length is a prime number

Performance Tips

Technique Why It Matters

Anchor patterns

^ERROR is 100x faster than .*ERROR

Character classes over alternation

[aeiou] beats a|e|i|o|u

Possessive quantifiers

\d++ prevents catastrophic backtracking

Atomic groups

(?>pattern) never backtracks

Avoid .{0,100}

Use .{1,100}? (lazy) or specific chars

Use \K for variable lookbehind

foo.\Kbar - faster than (?⇐foo.)bar

# The \K trick: reset match start
echo "error: something bad" | grep -oP 'error: \K.*'
# Output: something bad (without "error: ")
# Match between delimiters efficiently
grep -oP '<<\K[^>]+(?=>>)' file    # Extract <<content>>

See Also