Regex Reference - Pattern Matching Mastery

Master regex across all Linux tools.

Quick Reference

Basic Regex (BRE) - grep, sed

Pattern Meaning Example

Pattern	Meaning	Example
`.`	Any single character	`a.c` matches `abc`, `aXc`
`*`	Zero or more of preceding	`ab*c` matches `ac`, `abc`, `abbc`
`^`	Start of line	`^Error` matches lines starting with Error
`$`	End of line	`\.log$` matches lines ending in .log
`[]`	Character class	`[aeiou]` matches any vowel
`[^]`	Negated class	`[^0-9]` matches non-digits
`\`	Escape special char	`\.` matches literal dot

.

Any single character

a.c matches abc, aXc

*

Zero or more of preceding

ab*c matches ac, abc, abbc

^

Start of line

^Error matches lines starting with Error

$

End of line

\.log$ matches lines ending in .log

[]

Character class

[aeiou] matches any vowel

[^]

Negated class

[^0-9] matches non-digits

\

Escape special char

\. matches literal dot

Extended Regex (ERE) - grep -E, egrep

Pattern Meaning Example

Pattern	Meaning	Example
`+`	One or more	`ab+c` matches `abc`, `abbc` (not `ac`)
`?`	Zero or one	`colou?r` matches `color`, `colour`
`{n}`	Exactly n times	`a{3}` matches `aaa`
`\{n,m\}`	Between n and m	`a\{2,4\}` matches `aa`, `aaa`, `aaaa`
`\|`	Alternation (or)	`cat\|dog` matches `cat` or `dog`
`()`	Grouping	`(ab)+` matches `ab`, `abab`

+

One or more

ab+c matches abc, abbc (not ac)

?

Zero or one

colou?r matches color, colour

{n}

Exactly n times

a{3} matches aaa

\{n,m\}

Between n and m

a\{2,4\} matches aa, aaa, aaaa

|

Alternation (or)

cat|dog matches cat or dog

()

Grouping

(ab)+ matches ab, abab

Perl-Compatible Regex (PCRE) - grep -P, perl

Pattern Meaning Example

Pattern	Meaning	Example
`\d`	Digit [0-9]	`\d{3}` matches `123`
`\D`	Non-digit	`\D+` matches `abc`
`\w`	Word char [a-zA-Z0-9_]	`\w+` matches `hello_123`
`\W`	Non-word char	`\W` matches `@`, `#`, space
`\s`	Whitespace	`\s+` matches spaces/tabs
`\S`	Non-whitespace	`\S+` matches words
`\b`	Word boundary	`\bword\b` matches whole word
`(?:)`	Non-capturing group	`(?:ab)+` groups without capture
`(?=)`	Positive lookahead	`foo(?=bar)` matches `foo` before `bar`
`(?!)`	Negative lookahead	`foo(?!bar)` matches `foo` not before `bar`

\d

Digit [0-9]

\d{3} matches 123

\D

Non-digit

\D+ matches abc

\w

Word char [a-zA-Z0-9_]

\w+ matches hello_123

\W

Non-word char

\W matches @, #, space

\s

Whitespace

\s+ matches spaces/tabs

\S

Non-whitespace

\S+ matches words

\b

Word boundary

\bword\b matches whole word

(?:)

Non-capturing group

(?:ab)+ groups without capture

(?=)

Positive lookahead

foo(?=bar) matches foo before bar

(?!)

Negative lookahead

foo(?!bar) matches foo not before bar

Shell Globbing

Basic Globs

Pattern Meaning Example

Pattern	Meaning	Example
`*`	Any characters	`*.txt` matches all .txt files
`?`	Single character	`file?.txt` matches `file1.txt`
`[]`	Character class	`file[123].txt` matches `file1.txt`
`{}`	Brace expansion	`file.{txt,md}` expands to `file.txt file.md`

*

Any characters

*.txt matches all .txt files

?

Single character

file?.txt matches file1.txt

[]

Character class

file[123].txt matches file1.txt

{}

Brace expansion

file.{txt,md} expands to file.txt file.md

Extended Globs (shopt -s extglob)

shopt -s extglob

Pattern Meaning Example

Pattern	Meaning	Example
`?(pattern)`	Zero or one	`file?(s).txt`
`*(pattern)`	Zero or more	`*([0-9])`
`+(pattern)`	One or more	`+([a-z])`
`@(pattern)`	Exactly one	`@(foo\|bar)`
`!(pattern)`	Not matching	`!(*.txt)` all except .txt

?(pattern)

Zero or one

file?(s).txt

*(pattern)

Zero or more

*([0-9])

+(pattern)

One or more

+([a-z])

@(pattern)

Exactly one

@(foo|bar)

!(pattern)

Not matching

!(*.txt) all except .txt

Common Patterns

IP Addresses

# Basic IP match
grep -E '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' file

# Strict IPv4
grep -P '\b(?:\d{1,3}\.){3}\d{1,3}\b' file

Email Addresses

grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file

MAC Addresses

grep -iE '([0-9a-f]{2}:){5}[0-9a-f]{2}' file

URLs

grep -E 'https?://[^\s]+' file

Phone Numbers

# US format
grep -E '\(?[0-9]{3}\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}' file

Dates

# YYYY-MM-DD
grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}' file

# MM/DD/YYYY
grep -E '[0-9]{2}/[0-9]{2}/[0-9]{4}' file

Log Timestamps

# syslog format
grep -E '^[A-Z][a-z]{2} [0-9 ][0-9] [0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/syslog

# ISO 8601
grep -E '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}' file

Tool-Specific Usage

grep

# Basic regex
grep 'pattern' file

# Extended regex
grep -E 'pattern|other' file

# Perl regex
grep -P '\d+\.\d+\.\d+\.\d+' file

# Case insensitive
grep -i 'error' file

# Invert match
grep -v 'exclude' file

# Only matching part
grep -o 'pattern' file

sed

# Basic substitution
sed 's/old/new/' file

# Global substitution
sed 's/old/new/g' file

# Extended regex
sed -E 's/(group)/\1_suffix/g' file

# In-place edit
sed -i 's/old/new/g' file

# Delete lines matching pattern
sed '/pattern/d' file

awk

# Match pattern
awk '/pattern/ {print}' file

# Field matching
awk '$1 ~ /pattern/' file

# Negated match
awk '$1 !~ /pattern/' file

# Regex in condition
awk '/^Error/ {print $0}' file

perl

# One-liner substitution
perl -pe 's/old/new/g' file

# In-place edit
perl -i -pe 's/old/new/g' file

# Complex pattern
perl -ne 'print if /\b\d{3}-\d{4}\b/' file

Perl Rename (prename)

# Install
sudo pacman -S perl-rename    # Arch
sudo apt install rename       # Debian

# Lowercase all filenames
prename 'y/A-Z/a-z/' *

# Replace spaces with underscores
prename 's/ /_/g' *

# Add prefix
prename 's/^/prefix_/' *.txt

# Remove extension
prename 's/\.bak$//' *.bak

# Sequential numbering
prename 's/^/sprintf("%03d_", ++$n)/e' *.jpg

# Dry run (show changes without doing them)
prename -n 's/old/new/' *

Escape Characters

Character Escape When

Character	Escape	When
`.`	`\.`	Match literal dot
`*`	`\*`	Match literal asterisk
`?`	`\?`	Match literal question mark
`[`	`\[`	Match literal bracket
`(`	`\(`	Match literal paren (BRE needs escape)
`{`	`\{`	Match literal brace (BRE needs escape)
`\|`	`\\|`	Match literal pipe
`$`	`\$`	Match literal dollar
`^`	`\^`	Match literal caret (mid-pattern)

.

\.

Match literal dot

*

\*

Match literal asterisk

?

\?

Match literal question mark

[

\[

Match literal bracket

(

\(

Match literal paren (BRE needs escape)

{

\{

Match literal brace (BRE needs escape)

|

\|

Match literal pipe

$

\$

Match literal dollar

^

\^

Match literal caret (mid-pattern)

Yoda Tier - Advanced Pattern Mastery

"Do or do not. There is no `.*`"

Lookahead & Lookbehind (Zero-Width Assertions)

# Password must have uppercase, lowercase, digit, special char
grep -P '^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*]).{12,}$' passwords.txt

# Extract values NOT preceded by "test_" or "debug_"
grep -oP '(?<!test_)(?<!debug_)\b\w+_id\b' config.yaml

# Match "password" only if followed by "=" but don't capture "="
grep -oP 'password(?==)' file

# Find functions that DON'T have error handling
grep -P 'def \w+\([^)]*\):(?!.*try:)' *.py

Named Capture Groups

# Parse Apache logs with named groups
perl -ne 'print "$+{ip} $+{status}\n" if /(?<ip>\d+\.\d+\.\d+\.\d+).*?"(?<method>\w+) (?<path>[^"]+)" (?<status>\d+)/' access.log

# Extract structured data from messy logs
grep -oP '(?<timestamp>\d{4}-\d{2}-\d{2}T[\d:]+).*?(?<level>ERROR|WARN|INFO).*?(?<msg>[^|]+)' app.log

# Reuse named groups with backreference
grep -P '(?<quote>["\']).*?\k<quote>' file    # Match quoted strings

Recursive Patterns (Balanced Matching)

# Match balanced parentheses (PCRE magic)
grep -oP '\((?:[^()]+|(?R))*\)' code.c

# Match balanced braces in JSON/code
grep -oP '\{(?:[^{}]+|(?R))*\}' file.json

# Extract nested function calls
perl -ne 'print "$&\n" while /\w+\((?:[^()]+|(?R))*\)/g' code.py

Atomic Groups & Possessive Quantifiers

# Prevent catastrophic backtracking (atomic group)
grep -P '(?>\d+)\.' file    # Faster than \d+\.

# Possessive quantifier (never backtrack)
grep -P '\d++\.\d++' file   # Match decimals efficiently

# Parse huge logs without regex DoS
grep -P '^\S++\s++\S++\s++\[.*?\]' huge.log

Conditional Patterns

# Match different formats based on condition
# If group 1 matched "(", expect ")", else expect nothing
grep -P '(\()?\d{3}(?(1)\)|-)\d{3}-\d{4}' phones.txt
# Matches: (555)123-4567 or 555-123-4567

# Match protocol-specific patterns
grep -P '(?<proto>https?)://(?(<proto>)[\w.-]+(?::\d+)?|[\w.-]+)' urls.txt

Multi-line Sorcery

# Match across lines (slurp mode)
perl -0777 -ne 'print "$&\n" while /function\s+\w+\s*\{[^}]*\}/gs' code.js

# Extract multi-line SQL queries
perl -0777 -pe 's/--.*$//gm; s/\n/ /g' queries.sql | grep -oP 'SELECT.*?;'

# Find multi-line function definitions
grep -Pzo '(?s)def \w+\([^)]*\):.*?(?=\ndef |\Z)' script.py

# Match heredocs
grep -Pzo "(?s)<<'?(\w+)'?.*?^\1$" script.sh

Security & Forensics Patterns

# Detect potential SQL injection attempts
grep -iP "(union\s+(all\s+)?select|or\s+1\s*=\s*1|'\s*or\s*'|;\s*drop\s+table)" access.log

# Find hardcoded secrets (API keys, tokens)
grep -rP '(?i)(api[_-]?key|secret|token|password)\s*[:=]\s*["\047]?[a-zA-Z0-9_\-]{20,}' --include="*.{py,js,yaml,json,env}"

# Detect base64-encoded payloads
grep -oP '[A-Za-z0-9+/]{40,}={0,2}' file | while read b; do echo "$b" | base64 -d 2>/dev/null; done

# Find private keys in files
grep -rP '-----BEGIN (RSA |DSA |EC |OPENSSH )?PRIVATE KEY-----' .

# Detect command injection patterns
grep -P '\$\(.*\)|\`.*\`|;\s*(cat|curl|wget|nc|bash|sh|python|perl|ruby)' input.log

# Find JWT tokens
grep -oP 'eyJ[A-Za-z0-9_-]*\.eyJ[A-Za-z0-9_-]*\.[A-Za-z0-9_-]*' file

Log Parsing Dark Arts

# Parse nginx logs into CSV
perl -pe 's/^(\S+) - - \[(.*?)\] "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)"$/$1,$2,$3,$4,$5/' access.log

# Extract failed SSH attempts with username
grep -oP 'Failed password for (?:invalid user )?(?<user>\S+) from (?<ip>\S+)' /var/log/auth.log

# Parse systemd journal for service crashes
journalctl -p err --no-pager | grep -oP '(?<=\]: ).*(?=\.)' | sort | uniq -c | sort -rn

# Extract timing data from logs
grep -oP 'completed in (?<time>\d+\.?\d*)\s*(?<unit>ms|s|seconds?)' app.log

# Find log entries within time window
awk '/2024-01-15T10:00/,/2024-01-15T11:00/' app.log

Data Extraction Wizardry

# Extract all unique domains from text
grep -oP '(?:https?://)?(?:www\.)?(?<domain>[\w-]+(?:\.[\w-]+)+)' file | sort -u

# Parse JSON without jq (desperate times)
grep -oP '"name"\s*:\s*"\K[^"]+' data.json

# Extract markdown links: [text](url)
grep -oP '\[([^\]]+)\]\(([^)]+)\)' README.md

# Pull all environment variable references
grep -oP '\$\{?\w+\}?|\$[A-Z_][A-Z0-9_]*' script.sh | sort -u

# Extract version numbers (semver)
grep -oP '\bv?\d+\.\d+\.\d+(?:-[\w.]+)?(?:\+[\w.]+)?\b' CHANGELOG.md

One-Liner Legendry

# Validate IPv4 with proper range checking
grep -P '^(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)$' ips.txt

# Strip ANSI color codes
sed 's/\x1b\[[0-9;]*m//g' colored_output.txt

# Convert camelCase to snake_case
sed -E 's/([a-z])([A-Z])/\1_\L\2/g' file

# Remove duplicate lines preserving order (no sort)
awk '!seen[$0]++' file

# Extract code blocks from markdown
perl -0777 -ne 'print "$1\n" while /```\w*\n(.*?)```/gs' README.md

# Replace within matched context only
sed '/BEGIN/,/END/s/old/new/g' file

# Number non-empty lines
awk 'NF {print NR": "$0}' file

# Transpose CSV columns to rows
awk -F',' '{for(i=1;i<=NF;i++) a[NR,i]=$i} END {for(j=1;j<=NF;j++) {for(i=1;i<=NR;i++) printf "%s%s", a[i,j], (i==NR?ORS:OFS)}}' file.csv

The Forbidden Techniques

# Self-matching pattern (match pattern that matches itself)
grep -P '(?R)?' file    # Matches empty string infinitely... DON'T

# Regex quine (outputs itself) - for the curious only
perl -e '$_=q{$_=q{0};s/0/$_/;print};s/0/$_/;print'

# Match prime-length strings (yes, really)
perl -ne 'print if /^(?!(..+)\1+$)..+$/'
# Matches strings whose length is a prime number

Performance Tips

Technique Why It Matters

Technique	Why It Matters
Anchor patterns	`^ERROR` is 100x faster than `.*ERROR`
Character classes over alternation	`[aeiou]` beats `a\|e\|i\|o\|u`
Possessive quantifiers	`\d++` prevents catastrophic backtracking
Atomic groups	`(?>pattern)` never backtracks
Avoid `.{0,100}`	Use `.{1,100}?` (lazy) or specific chars
Use `\K` for variable lookbehind	`foo.\Kbar` - faster than `(?⇐foo.``)bar`

Anchor patterns

^ERROR is 100x faster than .*ERROR

Character classes over alternation

[aeiou] beats a|e|i|o|u

Possessive quantifiers

\d++ prevents catastrophic backtracking

Atomic groups

(?>pattern) never backtracks

Avoid .{0,100}

Use .{1,100}? (lazy) or specific chars

Use \K for variable lookbehind

foo.\Kbar - faster than (?⇐foo.)bar

# The \K trick: reset match start
echo "error: something bad" | grep -oP 'error: \K.*'
# Output: something bad (without "error: ")

# Match between delimiters efficiently
grep -oP '<<\K[^>]+(?=>>)' file    # Extract <<content>>

Regex Reference - Pattern Matching Mastery

Quick Reference

Basic Regex (BRE) - grep, sed

Extended Regex (ERE) - grep -E, egrep

Perl-Compatible Regex (PCRE) - grep -P, perl

Shell Globbing

Basic Globs

Extended Globs (shopt -s extglob)

Common Patterns

IP Addresses

Email Addresses

MAC Addresses

URLs

Phone Numbers

Dates

Log Timestamps

Tool-Specific Usage

grep

sed

awk

perl

Perl Rename (prename)

Escape Characters

Yoda Tier - Advanced Pattern Mastery

Lookahead & Lookbehind (Zero-Width Assertions)

Named Capture Groups

Recursive Patterns (Balanced Matching)

Atomic Groups & Possessive Quantifiers

Conditional Patterns

Multi-line Sorcery

Security & Forensics Patterns

Log Parsing Dark Arts

Data Extraction Wizardry

One-Liner Legendry

The Forbidden Techniques

Performance Tips

See Also