Regex Flavors

Regex engines vary significantly in syntax and features. Understanding these differences is essential when switching between tools. This reference covers the major flavors you’ll encounter in infrastructure work.

Flavor Overview

Flavor Tools Key Characteristics

Flavor	Tools	Key Characteristics
BRE (Basic Regular Expressions)	`grep`, `sed`, `ed`	Metacharacters require escaping
ERE (Extended Regular Expressions)	`grep -E`, `egrep`, `awk`	Modern syntax, no escaping for metacharacters
PCRE (Perl Compatible)	`grep -P`, `ripgrep`, Perl	Full-featured: lookaround, non-greedy, `\d`, `\w`
Python	`re` module	PCRE-like with some differences
JavaScript	`RegExp`, `/pattern/`	PCRE subset, ES2018 added lookbehind
Vim	`:s/pattern/replacement/`	Unique syntax, magic/nomagic modes

BRE (Basic Regular Expressions)

grep, sed, ed

Metacharacters require escaping

ERE (Extended Regular Expressions)

grep -E, egrep, awk

Modern syntax, no escaping for metacharacters

PCRE (Perl Compatible)

grep -P, ripgrep, Perl

Full-featured: lookaround, non-greedy, \d, \w

Python

re module

PCRE-like with some differences

JavaScript

RegExp, /pattern/

PCRE subset, ES2018 added lookbehind

Vim

:s/pattern/replacement/

Unique syntax, magic/nomagic modes

BRE vs ERE

The fundamental difference: in BRE, metacharacters require escaping.

Escaping Requirements

Feature BRE ERE Example

Feature	BRE	ERE	Example
Grouping	`\(\)`	`()`	`\(abc\)` vs `(abc)`
Alternation	`\|`	`\|`	`cat\|dog` vs `cat\|dog`
One or more	`\+`	`+`	`a+` vs `a+`
Zero or one	`\?`	`?`	`a\?` vs `a?`
Repetition	`\{n,m\}`	`{n,m}`	`a\{2,3\}` vs `a{2,3}`
Dot	`.`	`.`	Same in both
Asterisk	`*`	`*`	Same in both
Anchors	`^` `$`	`^` `$`	Same in both

Grouping

()

$abc$ vs (abc)

Alternation

|

cat|dog vs cat|dog

One or more

\+

+

a+ vs a+

Zero or one

\?

?

a\? vs a?

Repetition

\{n,m\}

{n,m}

a\{2,3\} vs a{2,3}

Dot

.

Same in both

Asterisk

*

Same in both

Anchors

^ $

Same in both

Practical Comparison

# Match one or more digits

# BRE (basic grep, sed)
grep '[0-9]\+' file.txt
sed -n '/[0-9]\+/p' file.txt

# ERE (grep -E, awk)
grep -E '[0-9]+' file.txt
awk '/[0-9]+/' file.txt

# Match IP address

# BRE
grep '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt

# ERE
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file.txt

When in doubt, use grep -E or sed -E for ERE mode.

PCRE Features

PCRE adds powerful features not available in BRE/ERE.

Shorthand Character Classes

Shorthand Meaning PCRE BRE/ERE

Shorthand	Meaning	PCRE	BRE/ERE
`\d`	Digit [0-9]	Yes	No
`\D`	Non-digit	Yes	No
`\w`	Word char [A-Za-z0-9_]	Yes	No
`\W`	Non-word char	Yes	No
`\s`	Whitespace	Yes	No
`\S`	Non-whitespace	Yes	No

\d

Digit [0-9]

Yes

\D

Non-digit

Yes

\w

Word char [A-Za-z0-9_]

Yes

\W

Non-word char

Yes

\s

Whitespace

Yes

\S

Non-whitespace

Yes

# PCRE - works
grep -P '\d+\.\d+\.\d+\.\d+' file.txt

# BRE/ERE - must use explicit classes
grep -E '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' file.txt

Non-Greedy Quantifiers

Greedy Non-Greedy Support

Greedy	Non-Greedy	Support
`*`	`*?`	PCRE, Python, JavaScript
`+`	`+?`	PCRE, Python, JavaScript
`?`	`??`	PCRE, Python, JavaScript
`{n,m}`	`{n,m}?`	PCRE, Python, JavaScript

*

*?

PCRE, Python, JavaScript

+

+?

PCRE, Python, JavaScript

?

??

PCRE, Python, JavaScript

{n,m}

{n,m}?

PCRE, Python, JavaScript

# Extract first quoted string

# Greedy (wrong) - not supported in BRE/ERE anyway
echo '"first" and "second"' | grep -oP '".*"'
# Output: "first" and "second"

# Non-greedy (correct) - PCRE only
echo '"first" and "second"' | grep -oP '".*?"'
# Output: "first"
#         "second"

Lookaround

Only available in PCRE, Python, JavaScript (ES2018+).

# Extract value after "port="
# PCRE
grep -oP '(?<=port=)\d+' config.txt

# BRE/ERE alternative (capture whole thing)
grep -oE 'port=[0-9]+' config.txt | cut -d= -f2

Named Groups

Flavor Syntax

Flavor	Syntax
PCRE	`(?<name>…)` or `(?P<name>…)`
Python	`(?P<name>…)`
JavaScript	`(?<name>…)`
BRE/ERE	Not supported

PCRE

(?<name>…) or (?P<name>…)

Python

(?P<name>…)

JavaScript

(?<name>…)

BRE/ERE

Not supported

Python `re` Module

Python’s regex closely follows PCRE with some differences.

Key Functions

import re

# Search (first match)
match = re.search(r'\d+', 'Port: 443')
if match:
    print(match.group())  # 443

# Match (from start only)
match = re.match(r'\d+', '443/tcp')
if match:
    print(match.group())  # 443

# Find all
matches = re.findall(r'\d+', '192.168.1.100')
print(matches)  # ['192', '168', '1', '100']

# Substitute
result = re.sub(r'\d+', 'X', '192.168.1.100')
print(result)  # X.X.X.X

Flags

import re

# Case insensitive
re.search(r'error', text, re.IGNORECASE)
re.search(r'(?i)error', text)  # Inline

# Multiline (^ and $ match line boundaries)
re.search(r'^ERROR', text, re.MULTILINE)
re.search(r'(?m)^ERROR', text)  # Inline

# Dotall (. matches newline)
re.search(r'.+', text, re.DOTALL)
re.search(r'(?s).+', text)  # Inline

# Verbose (allow comments and whitespace)
pattern = re.compile(r'''
    \d{4}    # Year
    -
    \d{2}    # Month
    -
    \d{2}    # Day
''', re.VERBOSE)

Raw Strings

Always use raw strings (r'…') for regex in Python:

# Wrong - \b is interpreted as backspace
re.search('\bword\b', text)

# Correct - r prefix makes it raw
re.search(r'\bword\b', text)

JavaScript RegExp

JavaScript regex has evolved significantly, especially with ES2018.

Creating Patterns

// Literal syntax
const pattern1 = /\d+/g;

// Constructor syntax (useful for dynamic patterns)
const pattern2 = new RegExp('\\d+', 'g');

Flags

/pattern/g    // Global (find all)
/pattern/i    // Case insensitive
/pattern/m    // Multiline
/pattern/s    // Dotall (ES2018)
/pattern/u    // Unicode
/pattern/y    // Sticky (match at lastIndex)

Methods

const text = 'Port: 443, Other: 8080';

// test - returns boolean
/\d+/.test(text);  // true

// match - returns array
text.match(/\d+/g);  // ['443', '8080']

// exec - returns match object (with groups)
const pattern = /Port: (\d+)/;
const result = pattern.exec(text);
// result[0] = 'Port: 443'
// result[1] = '443'

// replace
text.replace(/\d+/g, 'X');  // 'Port: X, Other: X'

// split
'a,b;c'.split(/[,;]/);  // ['a', 'b', 'c']

Named Groups (ES2018)

const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2026-03-15'.match(pattern);

console.log(match.groups.year);   // 2026
console.log(match.groups.month);  // 03
console.log(match.groups.day);    // 15

Lookbehind (ES2018)

// Positive lookbehind
const pricePattern = /(?<=\$)\d+/;
'$100'.match(pricePattern);  // ['100']

// Negative lookbehind
const notDollar = /(?<!\$)\d+/;
'€50'.match(notDollar);  // ['50']

Vim Regex

Vim uses a unique regex dialect with "magic" levels.

Magic Modes

Mode Setting Effect

Mode	Setting	Effect
nomagic	`:set nomagic` or `\M`	Almost everything is literal
magic	`:set magic` or `\m`	Default - some chars are special
very magic	`\v`	Almost everything is special (like ERE)
very nomagic	`\V`	Only `\` is special (literal search)

nomagic

:set nomagic or \M

Almost everything is literal

magic

:set magic or \m

Default - some chars are special

very magic

\v

Almost everything is special (like ERE)

very nomagic

\V

Only \ is special (literal search)

Common Vim Patterns

" Default magic mode
/[0-9]\+          " One or more digits
/\(foo\|bar\)     " foo or bar
/\<word\>         " Word boundaries

" Very magic mode (recommended)
/\v[0-9]+         " One or more digits
/\v(foo|bar)      " foo or bar
/\v<word>         " Word boundaries

" Substitution
:%s/old/new/g               " Replace all
:%s/\v(\w+)/[\1]/g          " Wrap words in brackets
:%s/\v<(\w)(\w+)/\u\1\l\2/g " Title case

Vim-Specific Features

" Atom specifiers
\_.     " Any character INCLUDING newline
\_s     " Whitespace including newline
\_^     " Start of file
\_$     " End of file

" Zero-width
\zs     " Start match here (like \K in PCRE)
\ze     " End match here

" Collections
\a      " Alphabetic [A-Za-z]
\d      " Digit [0-9]
\w      " Word character [A-Za-z0-9_]
\s      " Whitespace

Tool-Specific Syntax

sed

# BRE mode (default)
sed 's/[0-9]\+/X/g' file.txt

# ERE mode (-E or -r)
sed -E 's/[0-9]+/X/g' file.txt

# Delimiter can be any character
sed 's|/path/to/file|/new/path|g' file.txt
sed 's#http://##g' urls.txt

# Backreferences
sed -E 's/(\w+) (\w+)/\2 \1/g' file.txt  # Swap words

awk

# ERE always (no BRE mode)
awk '/[0-9]+/ { print }' file.txt

# Field matching
awk '$1 ~ /^[0-9]+$/ { print $2 }' file.txt

# gsub for replacement
awk '{ gsub(/old/, "new"); print }' file.txt

# Case insensitive (IGNORECASE)
awk 'BEGIN{IGNORECASE=1} /error/' file.txt

ripgrep (rg)

# PCRE2 by default (or Rust regex)
rg '\d+\.\d+\.\d+\.\d+'

# Case insensitive
rg -i 'error'

# Word boundary
rg -w 'log'

# Fixed string (no regex)
rg -F '192.168.1.1'

# PCRE2 features
rg -P '(?<=port=)\d+'

Feature Comparison Matrix

Feature BRE ERE PCRE Python JavaScript

Feature	BRE	ERE	PCRE	Python	JavaScript
`+` one or more	`\+`	`+`	`+`	`+`	`+`
`?` zero or one	`\?`	`?`	`?`	`?`	`?`
`{n,m}` repetition	`\{n,m\}`	`{n,m}`	`{n,m}`	`{n,m}`	`{n,m}`
`()` grouping	`\(\)`	`()`	`()`	`()`	`()`
`\|` alternation	`\|`	`\|`	`\|`	`\|`	`\|`
`\d` digit	No	No	Yes	Yes	Yes
`\w` word char	No	No	Yes	Yes	Yes
`(?:)` non-capturing	No	No	Yes	Yes	Yes
`(?=)` lookahead	No	No	Yes	Yes	Yes
`(?⇐)` lookbehind	No	No	Yes	Yes	ES2018+
`*?` non-greedy	No	No	Yes	Yes	Yes
`\K` match reset	No	No	Yes	No	No
`(?P<name>)` named group	No	No	Yes	Yes	`(?<name>)`

+ one or more

\+

+

? zero or one

\?

?

{n,m} repetition

\{n,m\}

{n,m}

() grouping

()

| alternation

|

\d digit

Yes

\w word char

Yes

(?:) non-capturing

Yes

(?=) lookahead

Yes

(?⇐) lookbehind

Yes

ES2018+

*? non-greedy

Yes

\K match reset

Yes

(?P<name>) named group

Yes

(?<name>)

Choosing the Right Flavor

Situation Recommendation

Situation	Recommendation
Simple pattern, maximum compatibility	ERE (`grep -E`, `awk`)
Need `\d`, `\w`, or lookaround	PCRE (`grep -P`, `ripgrep`)
Speed and large files	ripgrep (`rg`)
Interactive replacement	sed with ERE (`sed -E`)
Complex text processing	Python `re` module
Browser/Node.js	JavaScript RegExp
Vim editing	Vim with `\v` (very magic)

Simple pattern, maximum compatibility

ERE (grep -E, awk)

Need \d, \w, or lookaround

PCRE (grep -P, ripgrep)

Speed and large files

ripgrep (rg)

Interactive replacement

sed with ERE (sed -E)

Complex text processing

Python re module

Browser/Node.js

JavaScript RegExp

Vim editing

Vim with \v (very magic)

Self-Test Exercises

Try each challenge FIRST. Only expand the answer after you’ve attempted it.

Setup Test Data

cat << 'EOF' > /tmp/flavors.txt
Port: 443
Port: 8080
IP: 192.168.1.100
Date: 2026-03-15
Error: connection refused
ERROR: TIMEOUT
http://example.com
https://secure.example.com
server=prod-01
count=42
EOF

Challenge 1: BRE One-or-More

Goal: Match one or more digits using BRE (basic grep without -E)

Answer

grep '[0-9]\+' /tmp/flavors.txt

BRE requires + for one-or-more. Without backslash, + is literal.

Challenge 2: ERE One-or-More

Goal: Match one or more digits using ERE (grep -E)

Answer

grep -E '[0-9]+' /tmp/flavors.txt

ERE uses + without escaping. This is why -E is preferred.

Challenge 3: BRE Grouping

Goal: Match "http" or "https" using BRE grouping

Answer

grep 'https\?' /tmp/flavors.txt
# Or with alternation:
grep 'http\|https' /tmp/flavors.txt

BRE uses \? for optional. Alternation uses \|.

Challenge 4: ERE Grouping

Goal: Match "http" or "https" using ERE

Answer

grep -E 'https?' /tmp/flavors.txt
# Or:
grep -E '(http|https)' /tmp/flavors.txt

ERE uses ? and | without escaping.

Challenge 5: PCRE Shorthand

Goal: Extract all numbers using PCRE \d shorthand

Answer

grep -oP '\d+' /tmp/flavors.txt

\d only works with -P (PCRE). ERE equivalent is [0-9].

Challenge 6: Case Insensitive

Goal: Find both "Error" and "ERROR" case-insensitively

Answer

# Using flag
grep -i 'error' /tmp/flavors.txt

# Using PCRE inline modifier
grep -P '(?i)error' /tmp/flavors.txt

-i flag or (?i) modifier both work.

Challenge 7: Word Boundary PCRE

Goal: Match "server" as a whole word using PCRE \b

Answer

grep -P '\bserver\b' /tmp/flavors.txt

# ERE equivalent
grep -E '\<server\>' /tmp/flavors.txt

\b is PCRE only. \<\> works in BRE/ERE.

Challenge 8: Non-Greedy (PCRE Only)

Goal: Extract value after "=" using non-greedy quantifier

Answer

grep -oP '=.+?' /tmp/flavors.txt

+? is non-greedy (match minimum). PCRE only.

Challenge 9: Lookbehind (PCRE Only)

Goal: Extract value after "server=" without including "server="

Answer

grep -oP '(?<=server=)\S+' /tmp/flavors.txt

(?⇐…) lookbehind is PCRE only. Output: prod-01

Challenge 10: BRE Repetition

Goal: Match exactly 4 digits using BRE

Answer

grep '[0-9]\{4\}' /tmp/flavors.txt

BRE uses {n} with escaped braces.

Challenge 11: ERE Repetition

Goal: Match exactly 4 digits using ERE

Answer

grep -E '[0-9]{4}' /tmp/flavors.txt

ERE uses {n} without escaping.

Challenge 12: ripgrep Equivalent

Goal: Find all IPs using ripgrep (rg)

Answer

rg -o '\d+\.\d+\.\d+\.\d+' /tmp/flavors.txt

# Or with -P for PCRE2
rg -oP '\d+\.\d+\.\d+\.\d+' /tmp/flavors.txt

ripgrep uses Rust regex by default, PCRE2 with -P.

Key Takeaways

BRE requires escaping - \+, \?, \{\}, 
ERE is more readable - use grep -E, sed -E
PCRE has the most features - lookaround, \d, non-greedy
Use raw strings in Python - r'\d+' not '\d+'
JavaScript ES2018 added lookbehind - (?⇐…) and (?<!…)
Vim’s very magic (\v) is closest to ERE - easier to read
ripgrep is fast - prefer for large files

Next Module

Infrastructure Patterns - Production-ready patterns for IPs, MACs, logs, and configs.