Drill 03: Quantifiers

Quantifiers control repetition. Master them to match variable-length patterns like phone numbers, IP addresses, and log entries.

Core Concepts

Quantifier Meaning Example

*

Zero or more

ab*c matches "ac", "abc", "abbc"

+

One or more

ab+c matches "abc", "abbc", NOT "ac"

?

Zero or one (optional)

colou?r matches "color", "colour"

{n}

Exactly n times

a{3} matches "aaa"

\{n,}

n or more times

a\{2,} matches "aa", "aaa", "aaaa"…​

\{n,m}

Between n and m times

a\{2,4} matches "aa", "aaa", "aaaa"

Greedy vs Lazy (CRITICAL)

Type Behavior Pattern

Greedy (default)

Match as MUCH as possible

.*, +, \{n,m}

Lazy (add ?)

Match as LITTLE as possible

.*?, +?, \{n,m}?

# Greedy: matches entire span
echo '<div>first</div><div>second</div>' | grep -oP '<div>.*</div>'
# Output: <div>first</div><div>second</div>

# Lazy: matches first occurrence
echo '<div>first</div><div>second</div>' | grep -oP '<div>.*?</div>'
# Output: <div>first</div>

Interactive CLI Drill

bash ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/examples/regex-drills/03-quantifiers.sh

Exercise Set 1: Basic Quantifiers

cat << 'EOF' > /tmp/ex-quant.txt
192.168.1.100
10.0.0.1
255.255.255.0
port 80
port 443
port 8080
port 65535
MAC: AA:BB:CC:DD:EE:FF
MAC: aa:bb:cc:dd:ee:ff
file.txt
file.backup.txt
file.tar.gz
2026-03-15
2026-3-5
EOF

Ex 1.1: Match port numbers (1-5 digits)

Solution
grep -Eo 'port [0-9]{1,5}' /tmp/ex-quant.txt

Output: port 80, port 443, port 8080, port 65535

Ex 1.2: Match full IP addresses

Solution
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/ex-quant.txt

{1,3} matches 1-3 digits per octet.

Ex 1.3: Match MAC addresses (both cases)

Solution
grep -Eio '[A-F0-9]{2}(:[A-F0-9]{2}){5}' /tmp/ex-quant.txt

{5} repeats the colon-pair pattern exactly 5 times.

Ex 1.4: Match files with extensions

Solution
grep -Eo '[a-z]+(\.[a-z]+)+' /tmp/ex-quant.txt

(\.) matches one or more extensions.

Exercise Set 2: Optional Patterns

Ex 2.1: Match dates (leading zeros optional)

Solution
grep -Eo '[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}' /tmp/ex-quant.txt

Output: 2026-03-15, 2026-3-5

Ex 2.2: Match "http" or "https"

Solution
echo -e "http://example.com\nhttps://secure.com" | grep -Eo 'https?://'

s? makes the 's' optional.

Ex 2.3: Match with optional prefix

Solution
echo -e "VLAN100\nVLAN 100\n100" | grep -Eo '(VLAN ?)?[0-9]+'

(VLAN ?)? makes entire prefix optional, space optional within.

Exercise Set 3: Greedy vs Lazy

cat << 'EOF' > /tmp/ex-greedy.txt
<tag>content</tag>
<tag>first</tag><tag>second</tag>
"value1" and "value2"
key="setting1" key="setting2"
EOF

Ex 3.1: Extract first tag only (lazy)

Solution
grep -oP '<tag>.*?</tag>' /tmp/ex-greedy.txt

Output per line: - <tag>content</tag> - <tag>first</tag>

Ex 3.2: Extract all content between tags (greedy)

Solution
grep -oP '<tag>.*</tag>' /tmp/ex-greedy.txt

Line 2 outputs: <tag>first</tag><tag>second</tag>

Ex 3.3: Better alternative - negated class

Solution
# No backtracking, clearer intent
grep -oP '<tag>[^<]*</tag>' /tmp/ex-greedy.txt

[^<]* = any character except <, zero or more times. Faster and safer.

Ex 3.4: Extract quoted values (lazy)

Solution
grep -oP '".*?"' /tmp/ex-greedy.txt

Or better: "[^"]*"

Exercise Set 4: Bounded Quantifiers

Ex 4.1: Valid IP octet (0-255 pattern)

Solution
# This validates octet range (complex but accurate)
grep -Eo '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' /tmp/ex-quant.txt

Breaking it down: - 25[0-5] = 250-255 - 2[0-4][0-9] = 200-249 - [01]?[0-9][0-9]? = 0-199

Ex 4.2: Phone number parts

Solution
echo "555-123-4567" | grep -Eo '[0-9]{3}-[0-9]{3}-[0-9]{4}'

{3}, {3}, {4} = exact digit counts per segment.

Ex 4.3: Variable length with minimum

Solution
# Match passwords: 8+ characters
echo -e "pass\npassword123\nsecretkey" | grep -E '^.{8,}$'

Output: password123, secretkey (8+ chars only)

Real-World Applications

Professional: Log Timestamp Extraction

# ISO timestamp: YYYY-MM-DDTHH:MM:SS
grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/app.log

# Syslog timestamp: Month DD HH:MM:SS
grep -Eo '[A-Z][a-z]{2} +[0-9]{1,2} [0-9]{2}:[0-9]{2}:[0-9]{2}' /var/log/syslog

Professional: ISE Session Counts

# Extract session IDs (32-hex characters)
grep -Eo '[0-9a-fA-F]{32}' /var/log/ise-psc.log | sort -u | wc -l

Professional: VLAN Ranges

# Match VLAN IDs 1-4094
grep -Eo 'VLAN ?([1-9]|[1-9][0-9]{1,2}|[1-3][0-9]{3}|40[0-8][0-9]|409[0-4])' config.txt

Personal: Find Dollar Amounts

# Match prices like $9.99, $199.00, $1,234.56
grep -Eo '\$[0-9,]+(\.[0-9]{2})?' ~/receipts/*.txt

Personal: Extract Time Entries

# Match 12-hour times: 9:30am, 12:45pm
grep -Eio '[0-9]{1,2}:[0-9]{2} ?(am|pm)' ~/calendar.txt

Personal: Phone Number Formats

# Match various formats
grep -Eo '(\([0-9]{3}\)|[0-9]{3})[ -]?[0-9]{3}[ -]?[0-9]{4}' ~/contacts.txt

Tool Variants

sed: Quantifiers in Substitution

# Remove multiple spaces (one or more → single)
echo "too    many   spaces" | sed 's/  */ /g'

# Remove trailing digits
echo "file123.txt" | sed 's/[0-9]*\././'

awk: Pattern with Quantifiers

# Print lines with 4-digit years
awk '/[0-9]{4}/' file.txt

# Extract port numbers
echo "port 8080" | awk 'match($0, /[0-9]{1,5}/) {print substr($0, RSTART, RLENGTH)}'

vim: Quantifier Patterns

" Find 3+ consecutive digits
/[0-9]\{3,\}

" Replace multiple blank lines with one
:%s/\n\{3,\}/\r\r/g

" Match optional 's' (colour/color)
/colou\?r
In vim, quantifiers use \{n,m\} with escaped braces.

Python: Quantifier Patterns

import re

text = "Port 80, Port 443, Port 8080"

# Find all port numbers
ports = re.findall(r'\d{1,5}', text)
print(ports)  # ['80', '443', '8080']

# Greedy vs Lazy
html = "<div>first</div><div>second</div>"
greedy = re.findall(r'<div>.*</div>', html)
lazy = re.findall(r'<div>.*?</div>', html)
print(f"Greedy: {greedy}")  # Full string
print(f"Lazy: {lazy}")      # Two matches

Gotchas

Zero Matches is Valid

# * matches ZERO or more - this always matches
echo "ac" | grep -E 'ab*c'
# Output: ac (zero b's is valid)

# + requires at least one
echo "ac" | grep -E 'ab+c'
# No output (needs at least one b)

Greedy Can Be Surprising

# This grabs everything between FIRST and LAST quotes
echo '"a" "b" "c"' | grep -oP '".*"'
# Output: "a" "b" "c" (not what you wanted)

# Use negated class or lazy
echo '"a" "b" "c"' | grep -oP '"[^"]*"'
# Output: "a" "b" "c" (three separate matches)

BRE vs ERE Syntax

# BRE (grep without -E): escape quantifiers
grep 'a\{3\}' file.txt

# ERE (grep -E): no escape needed
grep -E 'a{3}' file.txt

Empty Matches with *

# Be careful - * can match nothing
echo "abc" | grep -o 'x*'
# Output: (empty lines - matches zero x's at each position)

Key Takeaways

Quantifier Remember

*

Zero or more. Can match nothing!

+

One or more. At least one required.

?

Optional (zero or one).

{n}

Exactly n times.

\{n,m}

Range: n to m times.

.*

Greedy: matches everything possible.

.*?

Lazy: matches minimum needed.

[^x]*

Better than lazy for known delimiter.

Self-Test

  1. What’s the difference between + and *?

  2. What does a{2,4} match?

  3. How do you make a greedy quantifier lazy?

  4. Why is often better than .? for quoted strings?

  5. What does colou?r match?

Answers
  1. + requires at least 1, * allows 0

  2. "aa", "aaa", or "aaaa" (2-4 a’s)

  3. Add ? after it: *?, +?, {n,m}?

  4. No backtracking, faster, clearer intent (stops at first quote)

  5. "color" and "colour" (u is optional)

Next Drill

Drill 04: Anchors & Boundaries - Master ^, $, \b, \<, \> for precise positioning.