Drill 03: Quantifiers
Quantifiers control repetition. Master them to match variable-length patterns like phone numbers, IP addresses, and log entries.
Core Concepts
| Quantifier | Meaning | Example |
|---|---|---|
|
Zero or more |
|
|
One or more |
|
|
Zero or one (optional) |
|
|
Exactly n times |
|
|
n or more times |
|
|
Between n and m times |
|
Greedy vs Lazy (CRITICAL)
| Type | Behavior | Pattern |
|---|---|---|
Greedy (default) |
Match as MUCH as possible |
|
Lazy (add |
Match as LITTLE as possible |
|
# Greedy: matches entire span
echo '<div>first</div><div>second</div>' | grep -oP '<div>.*</div>'
# Output: <div>first</div><div>second</div>
# Lazy: matches first occurrence
echo '<div>first</div><div>second</div>' | grep -oP '<div>.*?</div>'
# Output: <div>first</div>
Interactive CLI Drill
bash ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/examples/regex-drills/03-quantifiers.sh
Exercise Set 1: Basic Quantifiers
cat << 'EOF' > /tmp/ex-quant.txt
192.168.1.100
10.0.0.1
255.255.255.0
port 80
port 443
port 8080
port 65535
MAC: AA:BB:CC:DD:EE:FF
MAC: aa:bb:cc:dd:ee:ff
file.txt
file.backup.txt
file.tar.gz
2026-03-15
2026-3-5
EOF
Ex 1.1: Match port numbers (1-5 digits)
Solution
grep -Eo 'port [0-9]{1,5}' /tmp/ex-quant.txt
Output: port 80, port 443, port 8080, port 65535
Ex 1.2: Match full IP addresses
Solution
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/ex-quant.txt
{1,3} matches 1-3 digits per octet.
Ex 1.3: Match MAC addresses (both cases)
Solution
grep -Eio '[A-F0-9]{2}(:[A-F0-9]{2}){5}' /tmp/ex-quant.txt
{5} repeats the colon-pair pattern exactly 5 times.
Ex 1.4: Match files with extensions
Solution
grep -Eo '[a-z]+(\.[a-z]+)+' /tmp/ex-quant.txt
(\.) matches one or more extensions.
Exercise Set 2: Optional Patterns
Ex 2.1: Match dates (leading zeros optional)
Solution
grep -Eo '[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}' /tmp/ex-quant.txt
Output: 2026-03-15, 2026-3-5
Ex 2.2: Match "http" or "https"
Solution
echo -e "http://example.com\nhttps://secure.com" | grep -Eo 'https?://'
s? makes the 's' optional.
Ex 2.3: Match with optional prefix
Solution
echo -e "VLAN100\nVLAN 100\n100" | grep -Eo '(VLAN ?)?[0-9]+'
(VLAN ?)? makes entire prefix optional, space optional within.
Exercise Set 3: Greedy vs Lazy
cat << 'EOF' > /tmp/ex-greedy.txt
<tag>content</tag>
<tag>first</tag><tag>second</tag>
"value1" and "value2"
key="setting1" key="setting2"
EOF
Ex 3.1: Extract first tag only (lazy)
Solution
grep -oP '<tag>.*?</tag>' /tmp/ex-greedy.txt
Output per line:
- <tag>content</tag>
- <tag>first</tag>
Ex 3.2: Extract all content between tags (greedy)
Solution
grep -oP '<tag>.*</tag>' /tmp/ex-greedy.txt
Line 2 outputs: <tag>first</tag><tag>second</tag>
Ex 3.3: Better alternative - negated class
Solution
# No backtracking, clearer intent
grep -oP '<tag>[^<]*</tag>' /tmp/ex-greedy.txt
[^<]* = any character except <, zero or more times. Faster and safer.
Ex 3.4: Extract quoted values (lazy)
Solution
grep -oP '".*?"' /tmp/ex-greedy.txt
Or better: "[^"]*"
Exercise Set 4: Bounded Quantifiers
Ex 4.1: Valid IP octet (0-255 pattern)
Solution
# This validates octet range (complex but accurate)
grep -Eo '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' /tmp/ex-quant.txt
Breaking it down:
- 25[0-5] = 250-255
- 2[0-4][0-9] = 200-249
- [01]?[0-9][0-9]? = 0-199
Ex 4.2: Phone number parts
Solution
echo "555-123-4567" | grep -Eo '[0-9]{3}-[0-9]{3}-[0-9]{4}'
{3}, {3}, {4} = exact digit counts per segment.
Ex 4.3: Variable length with minimum
Solution
# Match passwords: 8+ characters
echo -e "pass\npassword123\nsecretkey" | grep -E '^.{8,}$'
Output: password123, secretkey (8+ chars only)
Real-World Applications
Professional: Log Timestamp Extraction
# ISO timestamp: YYYY-MM-DDTHH:MM:SS
grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/app.log
# Syslog timestamp: Month DD HH:MM:SS
grep -Eo '[A-Z][a-z]{2} +[0-9]{1,2} [0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/syslog
Professional: ISE Session Counts
# Extract session IDs (32-hex characters)
grep -Eo '[0-9a-fA-F]{32}' /var/log/ise-psc.log | sort -u | wc -l
Professional: VLAN Ranges
# Match VLAN IDs 1-4094
grep -Eo 'VLAN ?([1-9]|[1-9][0-9]{1,2}|[1-3][0-9]{3}|40[0-8][0-9]|409[0-4])' config.txt
Personal: Find Dollar Amounts
# Match prices like $9.99, $199.00, $1,234.56
grep -Eo '\$[0-9,]+(\.[0-9]{2})?' ~/receipts/*.txt
Personal: Extract Time Entries
# Match 12-hour times: 9:30am, 12:45pm
grep -Eio '[0-9]{1,2}:[0-9]{2} ?(am|pm)' ~/calendar.txt
Personal: Phone Number Formats
# Match various formats
grep -Eo '(\([0-9]{3}\)|[0-9]{3})[ -]?[0-9]{3}[ -]?[0-9]{4}' ~/contacts.txt
Tool Variants
sed: Quantifiers in Substitution
# Remove multiple spaces (one or more → single)
echo "too many spaces" | sed 's/ */ /g'
# Remove trailing digits
echo "file123.txt" | sed 's/[0-9]*\././'
awk: Pattern with Quantifiers
# Print lines with 4-digit years
awk '/[0-9]{4}/' file.txt
# Extract port numbers
echo "port 8080" | awk 'match($0, /[0-9]{1,5}/) {print substr($0, RSTART, RLENGTH)}'
vim: Quantifier Patterns
" Find 3+ consecutive digits
/[0-9]\{3,\}
" Replace multiple blank lines with one
:%s/\n\{3,\}/\r\r/g
" Match optional 's' (colour/color)
/colou\?r
In vim, quantifiers use \{n,m\} with escaped braces.
|
Python: Quantifier Patterns
import re
text = "Port 80, Port 443, Port 8080"
# Find all port numbers
ports = re.findall(r'\d{1,5}', text)
print(ports) # ['80', '443', '8080']
# Greedy vs Lazy
html = "<div>first</div><div>second</div>"
greedy = re.findall(r'<div>.*</div>', html)
lazy = re.findall(r'<div>.*?</div>', html)
print(f"Greedy: {greedy}") # Full string
print(f"Lazy: {lazy}") # Two matches
Gotchas
Zero Matches is Valid
# * matches ZERO or more - this always matches
echo "ac" | grep -E 'ab*c'
# Output: ac (zero b's is valid)
# + requires at least one
echo "ac" | grep -E 'ab+c'
# No output (needs at least one b)
Greedy Can Be Surprising
# This grabs everything between FIRST and LAST quotes
echo '"a" "b" "c"' | grep -oP '".*"'
# Output: "a" "b" "c" (not what you wanted)
# Use negated class or lazy
echo '"a" "b" "c"' | grep -oP '"[^"]*"'
# Output: "a" "b" "c" (three separate matches)
BRE vs ERE Syntax
# BRE (grep without -E): escape quantifiers
grep 'a\{3\}' file.txt
# ERE (grep -E): no escape needed
grep -E 'a{3}' file.txt
Empty Matches with *
# Be careful - * can match nothing
echo "abc" | grep -o 'x*'
# Output: (empty lines - matches zero x's at each position)
Key Takeaways
| Quantifier | Remember |
|---|---|
|
Zero or more. Can match nothing! |
|
One or more. At least one required. |
|
Optional (zero or one). |
|
Exactly n times. |
|
Range: n to m times. |
|
Greedy: matches everything possible. |
|
Lazy: matches minimum needed. |
|
Better than lazy for known delimiter. |
Self-Test
-
What’s the difference between
+and*? -
What does
a{2,4}match? -
How do you make a greedy quantifier lazy?
-
Why is
often better than.?for quoted strings? -
What does
colou?rmatch?
Answers
-
+requires at least 1,*allows 0 -
"aa", "aaa", or "aaaa" (2-4 a’s)
-
Add
?after it:*?,+?,{n,m}? -
No backtracking, faster, clearer intent (stops at first quote)
-
"color" and "colour" (u is optional)
Next Drill
Drill 04: Anchors & Boundaries - Master ^, $, \b, \<, \> for precise positioning.