Regex Flavors

Regex engines vary significantly in syntax and features. Understanding these differences is essential when switching between tools. This reference covers the major flavors you’ll encounter in infrastructure work.

Flavor Overview

Flavor Tools Key Characteristics

BRE (Basic Regular Expressions)

grep, sed, ed

Metacharacters require escaping

ERE (Extended Regular Expressions)

grep -E, egrep, awk

Modern syntax, no escaping for metacharacters

PCRE (Perl Compatible)

grep -P, ripgrep, Perl

Full-featured: lookaround, non-greedy, \d, \w

Python

re module

PCRE-like with some differences

JavaScript

RegExp, /pattern/

PCRE subset, ES2018 added lookbehind

Vim

:s/pattern/replacement/

Unique syntax, magic/nomagic modes

BRE vs ERE

The fundamental difference: in BRE, metacharacters require escaping.

Escaping Requirements

Feature BRE ERE Example

Grouping

\(\)

()

\(abc\) vs (abc)

Alternation

|

|

cat|dog vs cat|dog

One or more

\+

+

a+ vs a+

Zero or one

\?

?

a\? vs a?

Repetition

\{n,m\}

{n,m}

a\{2,3\} vs a{2,3}

Dot

.

.

Same in both

Asterisk

*

*

Same in both

Anchors

^ $

^ $

Same in both

Practical Comparison

# Match one or more digits

# BRE (basic grep, sed)
grep '[0-9]\+' file.txt
sed -n '/[0-9]\+/p' file.txt

# ERE (grep -E, awk)
grep -E '[0-9]+' file.txt
awk '/[0-9]+/' file.txt
# Match IP address

# BRE
grep '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt

# ERE
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file.txt
When in doubt, use grep -E or sed -E for ERE mode.

PCRE Features

PCRE adds powerful features not available in BRE/ERE.

Shorthand Character Classes

Shorthand Meaning PCRE BRE/ERE

\d

Digit [0-9]

Yes

No

\D

Non-digit

Yes

No

\w

Word char [A-Za-z0-9_]

Yes

No

\W

Non-word char

Yes

No

\s

Whitespace

Yes

No

\S

Non-whitespace

Yes

No

# PCRE - works
grep -P '\d+\.\d+\.\d+\.\d+' file.txt

# BRE/ERE - must use explicit classes
grep -E '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' file.txt

Non-Greedy Quantifiers

Greedy Non-Greedy Support

*

*?

PCRE, Python, JavaScript

+

+?

PCRE, Python, JavaScript

?

??

PCRE, Python, JavaScript

{n,m}

{n,m}?

PCRE, Python, JavaScript

# Extract first quoted string

# Greedy (wrong) - not supported in BRE/ERE anyway
echo '"first" and "second"' | grep -oP '".*"'
# Output: "first" and "second"

# Non-greedy (correct) - PCRE only
echo '"first" and "second"' | grep -oP '".*?"'
# Output: "first"
#         "second"

Lookaround

Only available in PCRE, Python, JavaScript (ES2018+).

# Extract value after "port="
# PCRE
grep -oP '(?<=port=)\d+' config.txt

# BRE/ERE alternative (capture whole thing)
grep -oE 'port=[0-9]+' config.txt | cut -d= -f2

Named Groups

Flavor Syntax

PCRE

(?<name>…​) or (?P<name>…​)

Python

(?P<name>…​)

JavaScript

(?<name>…​)

BRE/ERE

Not supported

Python re Module

Python’s regex closely follows PCRE with some differences.

Key Functions

import re

# Search (first match)
match = re.search(r'\d+', 'Port: 443')
if match:
    print(match.group())  # 443

# Match (from start only)
match = re.match(r'\d+', '443/tcp')
if match:
    print(match.group())  # 443

# Find all
matches = re.findall(r'\d+', '192.168.1.100')
print(matches)  # ['192', '168', '1', '100']

# Substitute
result = re.sub(r'\d+', 'X', '192.168.1.100')
print(result)  # X.X.X.X

Flags

import re

# Case insensitive
re.search(r'error', text, re.IGNORECASE)
re.search(r'(?i)error', text)  # Inline

# Multiline (^ and $ match line boundaries)
re.search(r'^ERROR', text, re.MULTILINE)
re.search(r'(?m)^ERROR', text)  # Inline

# Dotall (. matches newline)
re.search(r'.+', text, re.DOTALL)
re.search(r'(?s).+', text)  # Inline

# Verbose (allow comments and whitespace)
pattern = re.compile(r'''
    \d{4}    # Year
    -
    \d{2}    # Month
    -
    \d{2}    # Day
''', re.VERBOSE)

Raw Strings

Always use raw strings (r'…​') for regex in Python:

# Wrong - \b is interpreted as backspace
re.search('\bword\b', text)

# Correct - r prefix makes it raw
re.search(r'\bword\b', text)

JavaScript RegExp

JavaScript regex has evolved significantly, especially with ES2018.

Creating Patterns

// Literal syntax
const pattern1 = /\d+/g;

// Constructor syntax (useful for dynamic patterns)
const pattern2 = new RegExp('\\d+', 'g');

Flags

/pattern/g    // Global (find all)
/pattern/i    // Case insensitive
/pattern/m    // Multiline
/pattern/s    // Dotall (ES2018)
/pattern/u    // Unicode
/pattern/y    // Sticky (match at lastIndex)

Methods

const text = 'Port: 443, Other: 8080';

// test - returns boolean
/\d+/.test(text);  // true

// match - returns array
text.match(/\d+/g);  // ['443', '8080']

// exec - returns match object (with groups)
const pattern = /Port: (\d+)/;
const result = pattern.exec(text);
// result[0] = 'Port: 443'
// result[1] = '443'

// replace
text.replace(/\d+/g, 'X');  // 'Port: X, Other: X'

// split
'a,b;c'.split(/[,;]/);  // ['a', 'b', 'c']

Named Groups (ES2018)

const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2026-03-15'.match(pattern);

console.log(match.groups.year);   // 2026
console.log(match.groups.month);  // 03
console.log(match.groups.day);    // 15

Lookbehind (ES2018)

// Positive lookbehind
const pricePattern = /(?<=\$)\d+/;
'$100'.match(pricePattern);  // ['100']

// Negative lookbehind
const notDollar = /(?<!\$)\d+/;
'€50'.match(notDollar);  // ['50']

Vim Regex

Vim uses a unique regex dialect with "magic" levels.

Magic Modes

Mode Setting Effect

nomagic

:set nomagic or \M

Almost everything is literal

magic

:set magic or \m

Default - some chars are special

very magic

\v

Almost everything is special (like ERE)

very nomagic

\V

Only \ is special (literal search)

Common Vim Patterns

" Default magic mode
/[0-9]\+          " One or more digits
/\(foo\|bar\)     " foo or bar
/\<word\>         " Word boundaries

" Very magic mode (recommended)
/\v[0-9]+         " One or more digits
/\v(foo|bar)      " foo or bar
/\v<word>         " Word boundaries

" Substitution
:%s/old/new/g               " Replace all
:%s/\v(\w+)/[\1]/g          " Wrap words in brackets
:%s/\v<(\w)(\w+)/\u\1\l\2/g " Title case

Vim-Specific Features

" Atom specifiers
\_.     " Any character INCLUDING newline
\_s     " Whitespace including newline
\_^     " Start of file
\_$     " End of file

" Zero-width
\zs     " Start match here (like \K in PCRE)
\ze     " End match here

" Collections
\a      " Alphabetic [A-Za-z]
\d      " Digit [0-9]
\w      " Word character [A-Za-z0-9_]
\s      " Whitespace

Tool-Specific Syntax

sed

# BRE mode (default)
sed 's/[0-9]\+/X/g' file.txt

# ERE mode (-E or -r)
sed -E 's/[0-9]+/X/g' file.txt

# Delimiter can be any character
sed 's|/path/to/file|/new/path|g' file.txt
sed 's#http://##g' urls.txt

# Backreferences
sed -E 's/(\w+) (\w+)/\2 \1/g' file.txt  # Swap words

awk

# ERE always (no BRE mode)
awk '/[0-9]+/ { print }' file.txt

# Field matching
awk '$1 ~ /^[0-9]+$/ { print $2 }' file.txt

# gsub for replacement
awk '{ gsub(/old/, "new"); print }' file.txt

# Case insensitive (IGNORECASE)
awk 'BEGIN{IGNORECASE=1} /error/' file.txt

ripgrep (rg)

# PCRE2 by default (or Rust regex)
rg '\d+\.\d+\.\d+\.\d+'

# Case insensitive
rg -i 'error'

# Word boundary
rg -w 'log'

# Fixed string (no regex)
rg -F '192.168.1.1'

# PCRE2 features
rg -P '(?<=port=)\d+'

Feature Comparison Matrix

Feature BRE ERE PCRE Python JavaScript

+ one or more

\+

+

+

+

+

? zero or one

\?

?

?

?

?

{n,m} repetition

\{n,m\}

{n,m}

{n,m}

{n,m}

{n,m}

() grouping

\(\)

()

()

()

()

| alternation

|

|

|

|

|

\d digit

No

No

Yes

Yes

Yes

\w word char

No

No

Yes

Yes

Yes

(?:) non-capturing

No

No

Yes

Yes

Yes

(?=) lookahead

No

No

Yes

Yes

Yes

(?⇐) lookbehind

No

No

Yes

Yes

ES2018+

*? non-greedy

No

No

Yes

Yes

Yes

\K match reset

No

No

Yes

No

No

(?P<name>) named group

No

No

Yes

Yes

(?<name>)

Choosing the Right Flavor

Situation Recommendation

Simple pattern, maximum compatibility

ERE (grep -E, awk)

Need \d, \w, or lookaround

PCRE (grep -P, ripgrep)

Speed and large files

ripgrep (rg)

Interactive replacement

sed with ERE (sed -E)

Complex text processing

Python re module

Browser/Node.js

JavaScript RegExp

Vim editing

Vim with \v (very magic)

Self-Test Exercises

Try each challenge FIRST. Only expand the answer after you’ve attempted it.

Setup Test Data

cat << 'EOF' > /tmp/flavors.txt
Port: 443
Port: 8080
IP: 192.168.1.100
Date: 2026-03-15
Error: connection refused
ERROR: TIMEOUT
http://example.com
https://secure.example.com
server=prod-01
count=42
EOF

Challenge 1: BRE One-or-More

Goal: Match one or more digits using BRE (basic grep without -E)

Answer
grep '[0-9]\+' /tmp/flavors.txt

BRE requires + for one-or-more. Without backslash, + is literal.


Challenge 2: ERE One-or-More

Goal: Match one or more digits using ERE (grep -E)

Answer
grep -E '[0-9]+' /tmp/flavors.txt

ERE uses + without escaping. This is why -E is preferred.


Challenge 3: BRE Grouping

Goal: Match "http" or "https" using BRE grouping

Answer
grep 'https\?' /tmp/flavors.txt
# Or with alternation:
grep 'http\|https' /tmp/flavors.txt

BRE uses \? for optional. Alternation uses \|.


Challenge 4: ERE Grouping

Goal: Match "http" or "https" using ERE

Answer
grep -E 'https?' /tmp/flavors.txt
# Or:
grep -E '(http|https)' /tmp/flavors.txt

ERE uses ? and | without escaping.


Challenge 5: PCRE Shorthand

Goal: Extract all numbers using PCRE \d shorthand

Answer
grep -oP '\d+' /tmp/flavors.txt

\d only works with -P (PCRE). ERE equivalent is [0-9].


Challenge 6: Case Insensitive

Goal: Find both "Error" and "ERROR" case-insensitively

Answer
# Using flag
grep -i 'error' /tmp/flavors.txt

# Using PCRE inline modifier
grep -P '(?i)error' /tmp/flavors.txt

-i flag or (?i) modifier both work.


Challenge 7: Word Boundary PCRE

Goal: Match "server" as a whole word using PCRE \b

Answer
grep -P '\bserver\b' /tmp/flavors.txt

# ERE equivalent
grep -E '\<server\>' /tmp/flavors.txt

\b is PCRE only. \<\> works in BRE/ERE.


Challenge 8: Non-Greedy (PCRE Only)

Goal: Extract value after "=" using non-greedy quantifier

Answer
grep -oP '=.+?' /tmp/flavors.txt

+? is non-greedy (match minimum). PCRE only.


Challenge 9: Lookbehind (PCRE Only)

Goal: Extract value after "server=" without including "server="

Answer
grep -oP '(?<=server=)\S+' /tmp/flavors.txt

(?⇐…​) lookbehind is PCRE only. Output: prod-01


Challenge 10: BRE Repetition

Goal: Match exactly 4 digits using BRE

Answer
grep '[0-9]\{4\}' /tmp/flavors.txt

BRE uses {n} with escaped braces.


Challenge 11: ERE Repetition

Goal: Match exactly 4 digits using ERE

Answer
grep -E '[0-9]{4}' /tmp/flavors.txt

ERE uses {n} without escaping.


Challenge 12: ripgrep Equivalent

Goal: Find all IPs using ripgrep (rg)

Answer
rg -o '\d+\.\d+\.\d+\.\d+' /tmp/flavors.txt

# Or with -P for PCRE2
rg -oP '\d+\.\d+\.\d+\.\d+' /tmp/flavors.txt

ripgrep uses Rust regex by default, PCRE2 with -P.

Key Takeaways

  1. BRE requires escaping - \+, \?, \{\}, \(\)

  2. ERE is more readable - use grep -E, sed -E

  3. PCRE has the most features - lookaround, \d, non-greedy

  4. Use raw strings in Python - r'\d+' not '\d+'

  5. JavaScript ES2018 added lookbehind - (?⇐…​) and (?<!…​)

  6. Vim’s very magic (\v) is closest to ERE - easier to read

  7. ripgrep is fast - prefer for large files

Next Module

Infrastructure Patterns - Production-ready patterns for IPs, MACs, logs, and configs.