Regex Flavors
Regex engines vary significantly in syntax and features. Understanding these differences is essential when switching between tools. This reference covers the major flavors you’ll encounter in infrastructure work.
Flavor Overview
| Flavor | Tools | Key Characteristics |
|---|---|---|
BRE (Basic Regular Expressions) |
|
Metacharacters require escaping |
ERE (Extended Regular Expressions) |
|
Modern syntax, no escaping for metacharacters |
PCRE (Perl Compatible) |
|
Full-featured: lookaround, non-greedy, |
Python |
|
PCRE-like with some differences |
JavaScript |
|
PCRE subset, ES2018 added lookbehind |
Vim |
|
Unique syntax, magic/nomagic modes |
BRE vs ERE
The fundamental difference: in BRE, metacharacters require escaping.
Escaping Requirements
| Feature | BRE | ERE | Example |
|---|---|---|---|
Grouping |
|
|
|
Alternation |
|
|
|
One or more |
|
|
|
Zero or one |
|
|
|
Repetition |
|
|
|
Dot |
|
|
Same in both |
Asterisk |
|
|
Same in both |
Anchors |
|
|
Same in both |
Practical Comparison
# Match one or more digits
# BRE (basic grep, sed)
grep '[0-9]\+' file.txt
sed -n '/[0-9]\+/p' file.txt
# ERE (grep -E, awk)
grep -E '[0-9]+' file.txt
awk '/[0-9]+/' file.txt
# Match IP address
# BRE
grep '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt
# ERE
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file.txt
When in doubt, use grep -E or sed -E for ERE mode.
|
PCRE Features
PCRE adds powerful features not available in BRE/ERE.
Shorthand Character Classes
| Shorthand | Meaning | PCRE | BRE/ERE |
|---|---|---|---|
|
Digit [0-9] |
Yes |
No |
|
Non-digit |
Yes |
No |
|
Word char [A-Za-z0-9_] |
Yes |
No |
|
Non-word char |
Yes |
No |
|
Whitespace |
Yes |
No |
|
Non-whitespace |
Yes |
No |
# PCRE - works
grep -P '\d+\.\d+\.\d+\.\d+' file.txt
# BRE/ERE - must use explicit classes
grep -E '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' file.txt
Non-Greedy Quantifiers
| Greedy | Non-Greedy | Support |
|---|---|---|
|
|
PCRE, Python, JavaScript |
|
|
PCRE, Python, JavaScript |
|
|
PCRE, Python, JavaScript |
|
|
PCRE, Python, JavaScript |
# Extract first quoted string
# Greedy (wrong) - not supported in BRE/ERE anyway
echo '"first" and "second"' | grep -oP '".*"'
# Output: "first" and "second"
# Non-greedy (correct) - PCRE only
echo '"first" and "second"' | grep -oP '".*?"'
# Output: "first"
# "second"
Lookaround
Only available in PCRE, Python, JavaScript (ES2018+).
# Extract value after "port="
# PCRE
grep -oP '(?<=port=)\d+' config.txt
# BRE/ERE alternative (capture whole thing)
grep -oE 'port=[0-9]+' config.txt | cut -d= -f2
Named Groups
| Flavor | Syntax |
|---|---|
PCRE |
|
Python |
|
JavaScript |
|
BRE/ERE |
Not supported |
Python re Module
Python’s regex closely follows PCRE with some differences.
Key Functions
import re
# Search (first match)
match = re.search(r'\d+', 'Port: 443')
if match:
print(match.group()) # 443
# Match (from start only)
match = re.match(r'\d+', '443/tcp')
if match:
print(match.group()) # 443
# Find all
matches = re.findall(r'\d+', '192.168.1.100')
print(matches) # ['192', '168', '1', '100']
# Substitute
result = re.sub(r'\d+', 'X', '192.168.1.100')
print(result) # X.X.X.X
Flags
import re
# Case insensitive
re.search(r'error', text, re.IGNORECASE)
re.search(r'(?i)error', text) # Inline
# Multiline (^ and $ match line boundaries)
re.search(r'^ERROR', text, re.MULTILINE)
re.search(r'(?m)^ERROR', text) # Inline
# Dotall (. matches newline)
re.search(r'.+', text, re.DOTALL)
re.search(r'(?s).+', text) # Inline
# Verbose (allow comments and whitespace)
pattern = re.compile(r'''
\d{4} # Year
-
\d{2} # Month
-
\d{2} # Day
''', re.VERBOSE)
Raw Strings
Always use raw strings (r'…') for regex in Python:
# Wrong - \b is interpreted as backspace
re.search('\bword\b', text)
# Correct - r prefix makes it raw
re.search(r'\bword\b', text)
JavaScript RegExp
JavaScript regex has evolved significantly, especially with ES2018.
Creating Patterns
// Literal syntax
const pattern1 = /\d+/g;
// Constructor syntax (useful for dynamic patterns)
const pattern2 = new RegExp('\\d+', 'g');
Flags
/pattern/g // Global (find all)
/pattern/i // Case insensitive
/pattern/m // Multiline
/pattern/s // Dotall (ES2018)
/pattern/u // Unicode
/pattern/y // Sticky (match at lastIndex)
Methods
const text = 'Port: 443, Other: 8080';
// test - returns boolean
/\d+/.test(text); // true
// match - returns array
text.match(/\d+/g); // ['443', '8080']
// exec - returns match object (with groups)
const pattern = /Port: (\d+)/;
const result = pattern.exec(text);
// result[0] = 'Port: 443'
// result[1] = '443'
// replace
text.replace(/\d+/g, 'X'); // 'Port: X, Other: X'
// split
'a,b;c'.split(/[,;]/); // ['a', 'b', 'c']
Named Groups (ES2018)
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2026-03-15'.match(pattern);
console.log(match.groups.year); // 2026
console.log(match.groups.month); // 03
console.log(match.groups.day); // 15
Lookbehind (ES2018)
// Positive lookbehind
const pricePattern = /(?<=\$)\d+/;
'$100'.match(pricePattern); // ['100']
// Negative lookbehind
const notDollar = /(?<!\$)\d+/;
'€50'.match(notDollar); // ['50']
Vim Regex
Vim uses a unique regex dialect with "magic" levels.
Magic Modes
| Mode | Setting | Effect |
|---|---|---|
nomagic |
|
Almost everything is literal |
magic |
|
Default - some chars are special |
very magic |
|
Almost everything is special (like ERE) |
very nomagic |
|
Only |
Common Vim Patterns
" Default magic mode
/[0-9]\+ " One or more digits
/\(foo\|bar\) " foo or bar
/\<word\> " Word boundaries
" Very magic mode (recommended)
/\v[0-9]+ " One or more digits
/\v(foo|bar) " foo or bar
/\v<word> " Word boundaries
" Substitution
:%s/old/new/g " Replace all
:%s/\v(\w+)/[\1]/g " Wrap words in brackets
:%s/\v<(\w)(\w+)/\u\1\l\2/g " Title case
Vim-Specific Features
" Atom specifiers
\_. " Any character INCLUDING newline
\_s " Whitespace including newline
\_^ " Start of file
\_$ " End of file
" Zero-width
\zs " Start match here (like \K in PCRE)
\ze " End match here
" Collections
\a " Alphabetic [A-Za-z]
\d " Digit [0-9]
\w " Word character [A-Za-z0-9_]
\s " Whitespace
Tool-Specific Syntax
sed
# BRE mode (default)
sed 's/[0-9]\+/X/g' file.txt
# ERE mode (-E or -r)
sed -E 's/[0-9]+/X/g' file.txt
# Delimiter can be any character
sed 's|/path/to/file|/new/path|g' file.txt
sed 's#http://##g' urls.txt
# Backreferences
sed -E 's/(\w+) (\w+)/\2 \1/g' file.txt # Swap words
awk
# ERE always (no BRE mode)
awk '/[0-9]+/ { print }' file.txt
# Field matching
awk '$1 ~ /^[0-9]+$/ { print $2 }' file.txt
# gsub for replacement
awk '{ gsub(/old/, "new"); print }' file.txt
# Case insensitive (IGNORECASE)
awk 'BEGIN{IGNORECASE=1} /error/' file.txt
ripgrep (rg)
# PCRE2 by default (or Rust regex)
rg '\d+\.\d+\.\d+\.\d+'
# Case insensitive
rg -i 'error'
# Word boundary
rg -w 'log'
# Fixed string (no regex)
rg -F '192.168.1.1'
# PCRE2 features
rg -P '(?<=port=)\d+'
Feature Comparison Matrix
| Feature | BRE | ERE | PCRE | Python | JavaScript |
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
No |
No |
Yes |
Yes |
Yes |
|
No |
No |
Yes |
Yes |
Yes |
|
No |
No |
Yes |
Yes |
Yes |
|
No |
No |
Yes |
Yes |
Yes |
|
No |
No |
Yes |
Yes |
ES2018+ |
|
No |
No |
Yes |
Yes |
Yes |
|
No |
No |
Yes |
No |
No |
|
No |
No |
Yes |
Yes |
|
Choosing the Right Flavor
| Situation | Recommendation |
|---|---|
Simple pattern, maximum compatibility |
ERE ( |
Need |
PCRE ( |
Speed and large files |
ripgrep ( |
Interactive replacement |
sed with ERE ( |
Complex text processing |
Python |
Browser/Node.js |
JavaScript RegExp |
Vim editing |
Vim with |
Self-Test Exercises
| Try each challenge FIRST. Only expand the answer after you’ve attempted it. |
Setup Test Data
cat << 'EOF' > /tmp/flavors.txt
Port: 443
Port: 8080
IP: 192.168.1.100
Date: 2026-03-15
Error: connection refused
ERROR: TIMEOUT
http://example.com
https://secure.example.com
server=prod-01
count=42
EOF
Challenge 1: BRE One-or-More
Goal: Match one or more digits using BRE (basic grep without -E)
Answer
grep '[0-9]\+' /tmp/flavors.txt
BRE requires + for one-or-more. Without backslash, + is literal.
Challenge 2: ERE One-or-More
Goal: Match one or more digits using ERE (grep -E)
Answer
grep -E '[0-9]+' /tmp/flavors.txt
ERE uses + without escaping. This is why -E is preferred.
Challenge 3: BRE Grouping
Goal: Match "http" or "https" using BRE grouping
Answer
grep 'https\?' /tmp/flavors.txt
# Or with alternation:
grep 'http\|https' /tmp/flavors.txt
BRE uses \? for optional. Alternation uses \|.
Challenge 4: ERE Grouping
Goal: Match "http" or "https" using ERE
Answer
grep -E 'https?' /tmp/flavors.txt
# Or:
grep -E '(http|https)' /tmp/flavors.txt
ERE uses ? and | without escaping.
Challenge 5: PCRE Shorthand
Goal: Extract all numbers using PCRE \d shorthand
Answer
grep -oP '\d+' /tmp/flavors.txt
\d only works with -P (PCRE). ERE equivalent is [0-9].
Challenge 6: Case Insensitive
Goal: Find both "Error" and "ERROR" case-insensitively
Answer
# Using flag
grep -i 'error' /tmp/flavors.txt
# Using PCRE inline modifier
grep -P '(?i)error' /tmp/flavors.txt
-i flag or (?i) modifier both work.
Challenge 7: Word Boundary PCRE
Goal: Match "server" as a whole word using PCRE \b
Answer
grep -P '\bserver\b' /tmp/flavors.txt
# ERE equivalent
grep -E '\<server\>' /tmp/flavors.txt
\b is PCRE only. \<\> works in BRE/ERE.
Challenge 8: Non-Greedy (PCRE Only)
Goal: Extract value after "=" using non-greedy quantifier
Answer
grep -oP '=.+?' /tmp/flavors.txt
+? is non-greedy (match minimum). PCRE only.
Challenge 9: Lookbehind (PCRE Only)
Goal: Extract value after "server=" without including "server="
Answer
grep -oP '(?<=server=)\S+' /tmp/flavors.txt
(?⇐…) lookbehind is PCRE only. Output: prod-01
Challenge 10: BRE Repetition
Goal: Match exactly 4 digits using BRE
Answer
grep '[0-9]\{4\}' /tmp/flavors.txt
BRE uses {n} with escaped braces.
Challenge 11: ERE Repetition
Goal: Match exactly 4 digits using ERE
Answer
grep -E '[0-9]{4}' /tmp/flavors.txt
ERE uses {n} without escaping.
Challenge 12: ripgrep Equivalent
Goal: Find all IPs using ripgrep (rg)
Answer
rg -o '\d+\.\d+\.\d+\.\d+' /tmp/flavors.txt
# Or with -P for PCRE2
rg -oP '\d+\.\d+\.\d+\.\d+' /tmp/flavors.txt
ripgrep uses Rust regex by default, PCRE2 with -P.
Key Takeaways
-
BRE requires escaping -
\+,\?,\{\},\(\) -
ERE is more readable - use
grep -E,sed -E -
PCRE has the most features - lookaround,
\d, non-greedy -
Use raw strings in Python -
r'\d+'not'\d+' -
JavaScript ES2018 added lookbehind -
(?⇐…)and(?<!…) -
Vim’s very magic (
\v) is closest to ERE - easier to read -
ripgrep is fast - prefer for large files
Next Module
Infrastructure Patterns - Production-ready patterns for IPs, MACs, logs, and configs.