Quantifiers

Quantifiers specify how many times the preceding element should match. Understanding the difference between greedy, lazy, and possessive quantifiers is essential for writing efficient, accurate patterns.

Basic Quantifiers

Quantifier Meaning Pattern Example Matches

*

Zero or more

ab*c

"ac", "abc", "abbc", "abbbc"

+

One or more

ab+c

"abc", "abbc", "abbbc" (not "ac")

?

Zero or one

colou?r

"color", "colour"

The Asterisk *

Matches zero or more of the preceding element.

Pattern: go*gle
Matches: ggle, gogle, google, gooogle, goooogle
         ^     ^      ^^      ^^^      ^^^^
         0 o's 1 o    2 o's   3 o's    4 o's

Infrastructure Example:

Pattern: [0-9]*
Text:    Port: 443
Matches: "443" and also "" (empty) at other positions
* matches ZERO occurrences, which often causes unexpected behavior.

The Plus +

Matches one or more of the preceding element.

Pattern: go+gle
Matches: gogle, google, gooogle
         ^      ^^      ^^^
Does NOT match: ggle (needs at least one 'o')

Infrastructure Example:

Pattern: [0-9]+
Text:    Port: 443
Matches: "443" only (requires at least one digit)
Prefer + over * when you need at least one match.

The Question Mark ?

Matches zero or one of the preceding element (makes it optional).

Pattern: https?://
Matches: http://, https://

Infrastructure Example:

Pattern: VLAN ?[0-9]+
Matches: VLAN100, VLAN 100
         (space is optional)

Specific Repetition \{n,m}

Syntax Meaning Example

{n}

Exactly n times

\d{4} - exactly 4 digits

\{n,}

n or more times

\d\{2,} - 2+ digits

\{n,m}

Between n and m times

\d\{1,3} - 1 to 3 digits

\{0,n}

Up to n times

\d\{0,3} - 0 to 3 digits

Exact Count {n}

Pattern: [0-9]{4}
Matches: 2026, 1234, 9999 (exactly 4 digits)
Does NOT match: 123 (only 3), 12345 (matches first 4)

Infrastructure Examples:

# Year
\d{4}

# MAC octet
[A-Fa-f0-9]{2}

# IPv4 octet (not validated)
[0-9]{1,3}

# MD5 hash
[a-f0-9]{32}

# UUID
[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}

Range \{n,m}

Pattern: [0-9]{2,4}
Matches: 22, 443, 8080 (2-4 digits)

Why this matters for IP addresses:

# Each octet is 1-3 digits
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}

# Matches: 1.2.3.4, 192.168.1.100, 255.255.255.0

Open-ended \{n,}

Pattern: ERROR.{10,}
Meaning: ERROR followed by at least 10 characters
Use: Find error messages with details

Greedy vs Lazy Matching

The Greedy Problem

By default, quantifiers are greedy - they match as MUCH as possible.

Text:    <div>Hello</div><div>World</div>
Pattern: <.*>
Matches: <div>Hello</div><div>World</div>
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         (entire string - greedy grabbed everything)

The Lazy Solution

Add ? after a quantifier to make it lazy - match as LITTLE as possible.

Greedy Lazy Behavior

*

*?

Match minimum (prefers 0)

+

+?

Match minimum (prefers 1)

?

??

Match minimum (prefers 0)

{n,m}

{n,m}?

Match minimum (prefers n)

Text:    <div>Hello</div><div>World</div>
Pattern: <.*?>
Matches: <div> </div> <div> </div>
         ^^^^^  ^^^^^  ^^^^^  ^^^^^
         (minimal matches)

Practical Examples

Greedy (wrong for this use case):

Text:    "error: disk full" and "warning: low memory"
Pattern: ".*"
Matches: "error: disk full" and "warning: low memory"
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Lazy (correct):

Text:    "error: disk full" and "warning: low memory"
Pattern: ".*?"
Matches: "error: disk full"
         ^^^^^^^^^^^^^^^^^^
         "warning: low memory"
         ^^^^^^^^^^^^^^^^^^^^^

When to Use Each

Use Greedy Use Lazy

Consuming entire tokens

Extracting quoted strings

When you want maximum match

When you want minimal match

Performance (slightly faster)

When nested/repeated delimiters exist

Possessive Quantifiers (Advanced)

Possessive quantifiers don’t give back what they’ve matched (no backtracking).

Greedy Possessive Behavior

*

*+

Match max, don’t backtrack

+

++

Match max, don’t backtrack

?

?+

Match max, don’t backtrack

When to use: Performance optimization when you know backtracking won’t help.

# Greedy (backtracks)
Pattern: ".*"foo
Text:    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
Result:  Backtracks many times, then fails

# Possessive (no backtracking)
Pattern: ".*+"foo
Text:    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
Result:  Fails immediately (faster)
Possessive quantifiers are PCRE-only.

Applying to Character Classes

Quantifiers apply to the element immediately preceding them:

[0-9]+    → One or more digits
[a-z]{3}  → Exactly 3 lowercase letters
[A-F]+?   → One or more hex (lazy)

Common Patterns

Pattern Matches

[0-9]+

Integer (one or more digits)

[0-9]*\.[0-9]+

Decimal number

[a-zA-Z_][a-zA-Z0-9_]*

Identifier (C-style)

[A-Fa-f0-9]{2}(:[A-Fa-f0-9]{2}){5}

MAC address

[0-9]\{1,3}(\.[0-9]\{1,3}){3}

IPv4 address (structure, not validated)

BRE vs ERE Syntax

BRE (basic grep, sed): Must escape quantifiers

# BRE - escape required
grep '[0-9]\{3\}' file.txt
grep '[0-9]\+' file.txt

# ERE - no escape needed
grep -E '[0-9]{3}' file.txt
grep -E '[0-9]+' file.txt

Self-Test Exercises

Try each challenge FIRST. Only expand the answer after you’ve attempted it.

Setup Test Data

cat << 'EOF' > /tmp/quantifiers.txt
Port 22
Port 443
Port 8080
Port 65535
Year: 2026
Year: 99
IP: 192.168.1.1
IP: 10.50.1.100
IP: 8.8.8.8
Log: "error: connection refused" and "error: timeout"
Tag: <div>content</div><div>more</div>
Phone: 555-1234
Phone: (555) 867-5309
MAC: AA:BB:CC:DD:EE:FF
Hash: 5d41402abc4b2a76b9719d911017c592
EOF

Challenge 1: Match 4-Digit Numbers

Goal: Extract only 4-digit numbers (ports like 8080, year 2026)

Answer
grep -oE '\b[0-9]{4}\b' /tmp/quantifiers.txt

{4} means exactly 4 digits. \b ensures word boundary.


Challenge 2: Match 2-Digit Port

Goal: Extract the 2-digit port number (22)

Answer
grep -oE '\b[0-9]{2}\b' /tmp/quantifiers.txt

{2} means exactly 2 digits.


Challenge 3: Match IP Addresses

Goal: Extract all IP addresses (192.168.1.1, 10.50.1.100, 8.8.8.8)

Answer
grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/quantifiers.txt

{1,3} means 1 to 3 digits per octet. Don’t forget to escape the dots!


Challenge 4: Match One or More Digits

Goal: Extract ALL numbers from the file using +

Answer
grep -oE '[0-9]+' /tmp/quantifiers.txt

+ means one or more. This extracts every number.


Challenge 5: Extract Quoted Strings (Greedy Trap)

Goal: Extract BOTH quoted error messages separately (not as one match)

Answer
# Wrong - greedy grabs too much
grep -oP '".*"' /tmp/quantifiers.txt
# Output: "error: connection refused" and "error: timeout"

# Correct - lazy matches minimum
grep -oP '".*?"' /tmp/quantifiers.txt
# Output: "error: connection refused"
#         "error: timeout"

*? is lazy (non-greedy). PCRE only (-P).


Challenge 6: Match MAC Address

Goal: Extract the MAC address (AA:BB:CC:DD:EE:FF)

Answer
grep -oE '([A-F0-9]{2}:){5}[A-F0-9]{2}' /tmp/quantifiers.txt

Pattern: 5 pairs of hex followed by colon, then final pair.


Challenge 7: Match MD5 Hash

Goal: Extract the MD5 hash (32 hex characters)

Answer
grep -oE '[a-f0-9]{32}' /tmp/quantifiers.txt

MD5 is exactly 32 lowercase hex characters.


Challenge 8: Extract Each HTML Tag

Goal: Extract <div> and </div> tags separately (not the whole line)

Answer
# Greedy (wrong) - captures too much
grep -oE '<.*>' /tmp/quantifiers.txt
# Output: <div>content</div><div>more</div>

# Negated class (works in ERE)
grep -oE '<[^>]+>' /tmp/quantifiers.txt
# Output: <div>
#         </div>
#         <div>
#         </div>

# Lazy (PCRE)
grep -oP '<.*?>' /tmp/quantifiers.txt

[^>]+ means one or more characters that are NOT >.


Challenge 9: Optional Quantifier

Goal: Match both "Port 22" and "Port: 443" (colon may or may not be there)

Answer
grep -E 'Port:? [0-9]+' /tmp/quantifiers.txt

? makes the colon optional (zero or one).


Challenge 10: Port Range 1-4 Digits

Goal: Match port numbers that are 1-4 digits (22, 443, 8080) but NOT 5 digits

Answer
grep -oE '\b[0-9]{1,4}\b' /tmp/quantifiers.txt

{1,4} means 1 to 4 digits. The \b ensures we don’t match part of 65535.

Common Mistakes

Mistake 1: Using * When You Mean +

# Wrong - matches empty string at every position
[0-9]*

# Correct - requires at least one digit
[0-9]+

Mistake 2: Forgetting Greedy Behavior

# Wrong - captures too much
".*"

# Correct - minimal match
".*?"

Mistake 3: Not Escaping in BRE

# Wrong in basic grep
grep '[0-9]{3}' file.txt  # Literal {3}

# Correct in basic grep
grep '[0-9]\{3\}' file.txt

Key Takeaways

  1. * = zero or more - can match nothing

  2. + = one or more - requires at least one

  3. ? = optional - zero or one

  4. {n,m} = specific range - precise control

  5. Greedy is default - matches maximum possible

  6. Lazy ? suffix - matches minimum possible

  7. BRE requires escaping - \+, {n}

Next Module

Anchors & Boundaries - Position matching without consuming characters.