Regex Session 02: Quantifiers, Flavors & System Practice

Comprehensive quantifier mastery, understanding regex flavors across tools (grep, Python, JavaScript), and practicing on your actual system instead of just websites.

Regex Flavors: The Reality

Different tools use different regex "engines" with slightly different syntax:

Flavor Tools Key Differences

BRE (Basic)

grep, sed

`, `?`, `{}`, `()` need escaping: `\, \?, \{\}, \(\)

ERE (Extended)

grep -E, awk, egrep

+, ?, {}, () work without escaping

PCRE (Perl)

grep -P, ripgrep, Python, PHP

Lookahead, lookbehind, \d, \w, \s

JavaScript

Browsers, Node.js, regexr.com

Similar to PCRE, some differences

Vim

Vim/Neovim

\v for "very magic" (ERE-like), unique escaping

Python

re module

PCRE-like, named groups (?P<name>)

Practice Methods Comparison

Method Pros Cons

regexr.com

Visual, instant feedback, explains patterns

JavaScript only, not your actual tools

regex101.com

Multi-flavor (PCRE, JS, Python, Go), debugger

Still a website

grep/ripgrep

Real tool you’ll use, fast

No visual highlighting

Python REPL

Interactive, real PCRE, scriptable

More typing

Recommendation: Learn on regexr.com for visualization, then IMMEDIATELY practice with grep/ripgrep on real files.

Test File Setup

Create a test file with infrastructure data:

cat << 'EOF' > /tmp/regex-practice.txt
# Network Infrastructure Log
2026-03-15T10:30:45 INFO  Server started on 192.168.1.100:443
2026-03-15T10:30:46 INFO  VLAN 100 configured on Gi1/0/24
2026-03-15T10:31:00 WARN  Connection slow to 10.50.1.20
2026-03-15T10:31:15 ERROR Connection refused from 10.50.1.50:8080
2026-03-15T10:32:00 INFO  User evanusmodestus authenticated via 802.1X
2026-03-15T10:32:01 INFO  MAC AA:BB:CC:DD:EE:FF assigned to VLAN 100
2026-03-15T10:33:00 ERROR Authentication failed for user admin
2026-03-15T10:33:30 WARN  Certificate expires in 30 days
2026-03-15T10:34:00 INFO  Backup completed: 1.2GB transferred
2026-03-15T10:35:00 DEBUG Query took 145ms for endpoint /api/v1/users
IP Range: 10.50.1.0/24
Gateway: 10.50.1.1
DNS: 10.50.1.90, 10.50.1.91
Ports: 22, 80, 443, 8080, 8443
MAC Table:
  00:1A:2B:3C:4D:5E -> VLAN 10
  AA:BB:CC:DD:EE:FF -> VLAN 100
  11:22:33:44:55:66 -> VLAN 200
EOF

The Complete Quantifier Set

Quantifier Meaning Example Matches

*

0 or more

ab*c

"ac", "abc", "abbc", "abbbc"

+

1 or more

ab+c

"abc", "abbc", "abbbc" (NOT "ac")

?

0 or 1

colou?r

"color", "colour"

{n}

Exactly n

\d{3}

"192", "168", "100" (exactly 3 digits)

\{n,}

n or more

\d\{2,}

2+ digit numbers

\{n,m}

Between n and m

\d\{1,3}

1-3 digit numbers

BRE vs ERE vs PCRE - Side by Side

Match one or more digits

# BRE (basic grep) - must escape +
grep '[0-9]\+' /tmp/regex-practice.txt

# ERE (extended grep -E) - no escaping
grep -E '[0-9]+' /tmp/regex-practice.txt

# PCRE (grep -P or ripgrep) - shorthand \d works
grep -P '\d+' /tmp/regex-practice.txt
rg '\d+' /tmp/regex-practice.txt

All three produce the same result - lines containing digits.

Match IP addresses

# ERE - verbose
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/regex-practice.txt

# PCRE - cleaner with \d
grep -P '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' /tmp/regex-practice.txt

# ripgrep - same as PCRE
rg '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' /tmp/regex-practice.txt

Match MAC addresses

# ERE
grep -E '([A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}' /tmp/regex-practice.txt

# PCRE
grep -P '([A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}' /tmp/regex-practice.txt

Quantifier Practice - Exact Count {n}

# Match exactly 4 digits (years, ports)
grep -E '[0-9]{4}' /tmp/regex-practice.txt

# Match exactly 2 hex chars
grep -E '[A-Fa-f0-9]{2}' /tmp/regex-practice.txt

Quantifier Practice - Range {n,m}

# Match 1-3 digits (IP octets)
grep -oE '[0-9]{1,3}' /tmp/regex-practice.txt | head -20

# Match 2-4 digit numbers
grep -oE '\b[0-9]{2,4}\b' /tmp/regex-practice.txt

The -o flag shows ONLY the matched text, not the whole line.

Quantifier Practice - Open-ended {n,}

# Match 3 or more digits
grep -oE '[0-9]{3,}' /tmp/regex-practice.txt

Greedy vs Lazy (CRITICAL CONCEPT)

The Problem: Quantifiers are GREEDY by default - they match as MUCH as possible.

# Create test
echo 'Log: "error: disk full" and "warning: low memory"' > /tmp/greedy-test.txt

# Greedy (PCRE only - grep -P)
grep -oP '".*"' /tmp/greedy-test.txt
# Output: "error: disk full" and "warning: low memory"  (one match)

# Lazy (PCRE only)
grep -oP '".*?"' /tmp/greedy-test.txt
# Output:
# "error: disk full"
# "warning: low memory"

Lazy Quantifier Reference

Add ? after a quantifier to make it LAZY (match as LITTLE as possible):

Greedy Lazy Behavior

*

*?

Match minimum (0 preferred)

+

+?

Match minimum (1 preferred)

{n,m}

{n,m}?

Match minimum (n preferred)

Lazy quantifiers (*?, +?) require PCRE (grep -P or rg).

Shorthand Character Classes

These save typing:

Shorthand Equivalent Meaning

\d

[0-9]

Digit

\D

[^0-9]

NOT a digit

\w

[A-Za-z0-9_]

Word character

\W

[^A-Za-z0-9_]

NOT word character

\s

[ \t\n\r]

Whitespace

\S

[^ \t\n\r]

NOT whitespace

# Match IP with shorthand
grep -P '\d+\.\d+\.\d+\.\d+' /tmp/regex-practice.txt

# Match all words
grep -oP '\w+' /tmp/regex-practice.txt | head -20

Anchors (Position Matching)

Anchor Meaning

^

Start of line

$

End of line

\b

Word boundary

# Match digits at START of line
grep -E '^\d+' /tmp/regex-practice.txt

# Match digits at END of line
grep -oE '\d+$' /tmp/regex-practice.txt

# Match VLAN as whole word
grep -E '\bVLAN\b' /tmp/regex-practice.txt

Python REPL Practice

For interactive exploration with PCRE:

python3
import re

text = """
2026-03-15T10:30:45 INFO Server started on 192.168.1.100:443
MAC: AA:BB:CC:DD:EE:FF assigned to VLAN 100
"""

# Find all IPs
re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', text)
# ['192.168.1.100']

# Find all MACs (non-capturing group)
re.findall(r'(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}', text)
# ['AA:BB:CC:DD:EE:FF']

# Find timestamps
re.findall(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}', text)
# ['2026-03-15T10:30:45']

# Greedy vs lazy
text2 = '"first" and "second"'
re.findall(r'".*"', text2)   # Greedy: ['"first" and "second"']
re.findall(r'".*?"', text2)  # Lazy: ['"first"', '"second"']

JavaScript Practice (Browser Console)

Open browser DevTools (F12) → Console:

const text = `
2026-03-15T10:30:45 INFO Server started on 192.168.1.100:443
MAC: AA:BB:CC:DD:EE:FF assigned to VLAN 100
`;

// Find all IPs
text.match(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/g)
// ['192.168.1.100']

// Find all MACs
text.match(/([A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}/g)
// ['AA:BB:CC:DD:EE:FF']

// Greedy vs lazy
const text2 = '"first" and "second"';
text2.match(/".*"/g)   // Greedy: ['"first" and "second"']
text2.match(/".*?"/g)  // Lazy: ['"first"', '"second"']

Flavor Compatibility Cheat Sheet

Feature BRE ERE PCRE/JS/Python

+ (one or more)

\+

+

+

? (zero or one)

\?

?

?

{n,m} (range)

\{n,m\}

{n,m}

{n,m}

() (group)

\(\)

()

()

\d (digit)

NO

NO

YES

\w (word)

NO

NO

YES

\s (space)

NO

NO

YES

*? (lazy)

NO

NO

YES

(?=) (lookahead)

NO

NO

YES

(?⇐) (lookbehind)

NO

NO

YES

Comprehensive Exercise Set

Run these on your system:

# 1. Find all ERROR lines
grep -E 'ERROR' /tmp/regex-practice.txt

# 2. Find all IP addresses (extract only)
grep -oE '\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b' /tmp/regex-practice.txt

# 3. Find all MAC addresses
grep -oE '([A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}' /tmp/regex-practice.txt

# 4. Find all VLAN numbers
grep -oE 'VLAN [0-9]+' /tmp/regex-practice.txt

# 5. Find timestamps
grep -oP '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}' /tmp/regex-practice.txt

# 6. Find port numbers (after colon)
grep -oP ':\d{2,5}\b' /tmp/regex-practice.txt

# 7. Find all usernames (word after "user" or "User")
grep -oiP '(?<=user )\w+' /tmp/regex-practice.txt

# 8. Find lines with warnings or errors (case insensitive)
grep -iE 'warn|error' /tmp/regex-practice.txt

# 9. Find durations in milliseconds
grep -oP '\d+ms' /tmp/regex-practice.txt

# 10. Find data sizes (like 1.2GB)
grep -oP '\d+\.?\d*[KMGT]B' /tmp/regex-practice.txt

Infrastructure Power Patterns

Pattern Use Case

\b\d{1,5}\b

Valid port number (1-65535)

\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}

ISO timestamp

VLAN\s+\d\{1,4}

VLAN with number

([A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}

MAC address

\d\{1,3}\.\d\{1,3}\.\d\{1,3}\.\d\{1,3}

IPv4 address

(?⇐user )\w+

Username after "user " (lookbehind)

".*?"

Quoted strings (lazy)

Next Concepts

Once quantifiers feel solid:

  1. Groups & Capturing - () to extract parts of matches

  2. Non-capturing groups - (?:) for grouping without capturing

  3. Alternation - \| for OR logic

  4. Lookahead - (?=) and (?!) for conditional matching

  5. Lookbehind - (?⇐) and (?<!) for matching based on what precedes

  6. Named groups - (?P<name>) in Python for readable extractions

Session Reflection

What clicked:

  • <Write what made sense>

What’s still fuzzy:

  • <Write what needs more practice>

Connection to work:

  • <How will you use this?>

Session Log

Timestamp Notes

2026-03-15

Completed quantifiers deep dive, flavor comparison, system practice