Quantifiers
Quantifiers specify how many times the preceding element should match. Understanding the difference between greedy, lazy, and possessive quantifiers is essential for writing efficient, accurate patterns.
Basic Quantifiers
| Quantifier | Meaning | Pattern Example | Matches |
|---|---|---|---|
|
Zero or more |
|
"ac", "abc", "abbc", "abbbc" |
|
One or more |
|
"abc", "abbc", "abbbc" (not "ac") |
|
Zero or one |
|
"color", "colour" |
The Asterisk *
Matches zero or more of the preceding element.
Pattern: go*gle
Matches: ggle, gogle, google, gooogle, goooogle
^ ^ ^^ ^^^ ^^^^
0 o's 1 o 2 o's 3 o's 4 o's
Infrastructure Example:
Pattern: [0-9]*
Text: Port: 443
Matches: "443" and also "" (empty) at other positions
* matches ZERO occurrences, which often causes unexpected behavior.
|
The Plus +
Matches one or more of the preceding element.
Pattern: go+gle
Matches: gogle, google, gooogle
^ ^^ ^^^
Does NOT match: ggle (needs at least one 'o')
Infrastructure Example:
Pattern: [0-9]+
Text: Port: 443
Matches: "443" only (requires at least one digit)
Prefer + over * when you need at least one match.
|
The Question Mark ?
Matches zero or one of the preceding element (makes it optional).
Pattern: https?://
Matches: http://, https://
Infrastructure Example:
Pattern: VLAN ?[0-9]+
Matches: VLAN100, VLAN 100
(space is optional)
Specific Repetition \{n,m}
| Syntax | Meaning | Example |
|---|---|---|
|
Exactly n times |
|
|
n or more times |
|
|
Between n and m times |
|
|
Up to n times |
|
Exact Count {n}
Pattern: [0-9]{4}
Matches: 2026, 1234, 9999 (exactly 4 digits)
Does NOT match: 123 (only 3), 12345 (matches first 4)
Infrastructure Examples:
# Year
\d{4}
# MAC octet
[A-Fa-f0-9]{2}
# IPv4 octet (not validated)
[0-9]{1,3}
# MD5 hash
[a-f0-9]{32}
# UUID
[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}
Range \{n,m}
Pattern: [0-9]{2,4}
Matches: 22, 443, 8080 (2-4 digits)
Why this matters for IP addresses:
# Each octet is 1-3 digits
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}
# Matches: 1.2.3.4, 192.168.1.100, 255.255.255.0
Open-ended \{n,}
Pattern: ERROR.{10,}
Meaning: ERROR followed by at least 10 characters
Use: Find error messages with details
Greedy vs Lazy Matching
The Greedy Problem
By default, quantifiers are greedy - they match as MUCH as possible.
Text: <div>Hello</div><div>World</div>
Pattern: <.*>
Matches: <div>Hello</div><div>World</div>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(entire string - greedy grabbed everything)
The Lazy Solution
Add ? after a quantifier to make it lazy - match as LITTLE as possible.
| Greedy | Lazy | Behavior |
|---|---|---|
|
|
Match minimum (prefers 0) |
|
|
Match minimum (prefers 1) |
|
|
Match minimum (prefers 0) |
|
|
Match minimum (prefers n) |
Text: <div>Hello</div><div>World</div>
Pattern: <.*?>
Matches: <div> </div> <div> </div>
^^^^^ ^^^^^ ^^^^^ ^^^^^
(minimal matches)
Practical Examples
Greedy (wrong for this use case):
Text: "error: disk full" and "warning: low memory"
Pattern: ".*"
Matches: "error: disk full" and "warning: low memory"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Lazy (correct):
Text: "error: disk full" and "warning: low memory"
Pattern: ".*?"
Matches: "error: disk full"
^^^^^^^^^^^^^^^^^^
"warning: low memory"
^^^^^^^^^^^^^^^^^^^^^
When to Use Each
| Use Greedy | Use Lazy |
|---|---|
Consuming entire tokens |
Extracting quoted strings |
When you want maximum match |
When you want minimal match |
Performance (slightly faster) |
When nested/repeated delimiters exist |
Possessive Quantifiers (Advanced)
Possessive quantifiers don’t give back what they’ve matched (no backtracking).
| Greedy | Possessive | Behavior |
|---|---|---|
|
|
Match max, don’t backtrack |
|
|
Match max, don’t backtrack |
|
|
Match max, don’t backtrack |
When to use: Performance optimization when you know backtracking won’t help.
# Greedy (backtracks)
Pattern: ".*"foo
Text: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
Result: Backtracks many times, then fails
# Possessive (no backtracking)
Pattern: ".*+"foo
Text: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab"
Result: Fails immediately (faster)
| Possessive quantifiers are PCRE-only. |
Applying to Character Classes
Quantifiers apply to the element immediately preceding them:
[0-9]+ → One or more digits
[a-z]{3} → Exactly 3 lowercase letters
[A-F]+? → One or more hex (lazy)
Common Patterns
| Pattern | Matches |
|---|---|
|
Integer (one or more digits) |
|
Decimal number |
|
Identifier (C-style) |
|
MAC address |
|
IPv4 address (structure, not validated) |
BRE vs ERE Syntax
BRE (basic grep, sed): Must escape quantifiers
# BRE - escape required
grep '[0-9]\{3\}' file.txt
grep '[0-9]\+' file.txt
# ERE - no escape needed
grep -E '[0-9]{3}' file.txt
grep -E '[0-9]+' file.txt
Self-Test Exercises
| Try each challenge FIRST. Only expand the answer after you’ve attempted it. |
Setup Test Data
cat << 'EOF' > /tmp/quantifiers.txt
Port 22
Port 443
Port 8080
Port 65535
Year: 2026
Year: 99
IP: 192.168.1.1
IP: 10.50.1.100
IP: 8.8.8.8
Log: "error: connection refused" and "error: timeout"
Tag: <div>content</div><div>more</div>
Phone: 555-1234
Phone: (555) 867-5309
MAC: AA:BB:CC:DD:EE:FF
Hash: 5d41402abc4b2a76b9719d911017c592
EOF
Challenge 1: Match 4-Digit Numbers
Goal: Extract only 4-digit numbers (ports like 8080, year 2026)
Answer
grep -oE '\b[0-9]{4}\b' /tmp/quantifiers.txt
{4} means exactly 4 digits. \b ensures word boundary.
Challenge 2: Match 2-Digit Port
Goal: Extract the 2-digit port number (22)
Answer
grep -oE '\b[0-9]{2}\b' /tmp/quantifiers.txt
{2} means exactly 2 digits.
Challenge 3: Match IP Addresses
Goal: Extract all IP addresses (192.168.1.1, 10.50.1.100, 8.8.8.8)
Answer
grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/quantifiers.txt
{1,3} means 1 to 3 digits per octet. Don’t forget to escape the dots!
Challenge 4: Match One or More Digits
Goal: Extract ALL numbers from the file using +
Answer
grep -oE '[0-9]+' /tmp/quantifiers.txt
+ means one or more. This extracts every number.
Challenge 5: Extract Quoted Strings (Greedy Trap)
Goal: Extract BOTH quoted error messages separately (not as one match)
Answer
# Wrong - greedy grabs too much
grep -oP '".*"' /tmp/quantifiers.txt
# Output: "error: connection refused" and "error: timeout"
# Correct - lazy matches minimum
grep -oP '".*?"' /tmp/quantifiers.txt
# Output: "error: connection refused"
# "error: timeout"
*? is lazy (non-greedy). PCRE only (-P).
Challenge 6: Match MAC Address
Goal: Extract the MAC address (AA:BB:CC:DD:EE:FF)
Answer
grep -oE '([A-F0-9]{2}:){5}[A-F0-9]{2}' /tmp/quantifiers.txt
Pattern: 5 pairs of hex followed by colon, then final pair.
Challenge 7: Match MD5 Hash
Goal: Extract the MD5 hash (32 hex characters)
Answer
grep -oE '[a-f0-9]{32}' /tmp/quantifiers.txt
MD5 is exactly 32 lowercase hex characters.
Challenge 8: Extract Each HTML Tag
Goal: Extract <div> and </div> tags separately (not the whole line)
Answer
# Greedy (wrong) - captures too much
grep -oE '<.*>' /tmp/quantifiers.txt
# Output: <div>content</div><div>more</div>
# Negated class (works in ERE)
grep -oE '<[^>]+>' /tmp/quantifiers.txt
# Output: <div>
# </div>
# <div>
# </div>
# Lazy (PCRE)
grep -oP '<.*?>' /tmp/quantifiers.txt
[^>]+ means one or more characters that are NOT >.
Challenge 9: Optional Quantifier
Goal: Match both "Port 22" and "Port: 443" (colon may or may not be there)
Answer
grep -E 'Port:? [0-9]+' /tmp/quantifiers.txt
? makes the colon optional (zero or one).
Challenge 10: Port Range 1-4 Digits
Goal: Match port numbers that are 1-4 digits (22, 443, 8080) but NOT 5 digits
Answer
grep -oE '\b[0-9]{1,4}\b' /tmp/quantifiers.txt
{1,4} means 1 to 4 digits. The \b ensures we don’t match part of 65535.
Common Mistakes
Mistake 1: Using * When You Mean +
# Wrong - matches empty string at every position
[0-9]*
# Correct - requires at least one digit
[0-9]+
Mistake 2: Forgetting Greedy Behavior
# Wrong - captures too much
".*"
# Correct - minimal match
".*?"
Mistake 3: Not Escaping in BRE
# Wrong in basic grep
grep '[0-9]{3}' file.txt # Literal {3}
# Correct in basic grep
grep '[0-9]\{3\}' file.txt
Key Takeaways
-
*= zero or more - can match nothing -
+= one or more - requires at least one -
?= optional - zero or one -
{n,m}= specific range - precise control -
Greedy is default - matches maximum possible
-
Lazy
?suffix - matches minimum possible -
BRE requires escaping -
\+,{n}
Next Module
Anchors & Boundaries - Position matching without consuming characters.