Regex Gotchas: The Traps Everyone Falls Into
These are the mistakes EVERYONE makes. Learning to spot them instantly separates beginners from intermediate users.
| You just hit one of these (escaping brackets instead of dots). This page drills these traps until they’re automatic. |
Setup
cat << 'EOF' > /tmp/gotchas.txt
192.168.1.100
192X168Y1Z100
10.50.1.20
10X50X1X20
.hidden-file
file.txt
C:\Users\Admin
/home/evan
$99.99
$$VARIABLE$$
Price: $100
[ERROR] failed
[WARN] warning
{braces}
|pipe|
^caret
end$
question?
plus+
star*
(parens)
the the duplicate
word word here
color colour
gray grey
EOF
TRAP 1: Unescaped Dot
The . matches ANY character, not just a literal period.
The Bug
# WRONG - matches 192X168Y1Z100 too!
grep '192.168.1.100' /tmp/gotchas.txt
Try it:
Output
192.168.1.100 192X168Y1Z100
Both match because . matches X, Y, Z too!
The Fix
# CORRECT - escape the dots
grep '192\.168\.1\.100' /tmp/gotchas.txt
Output
192.168.1.100
Only the actual IP matches now.
TRAP 2: Wrong Escape Target
You escaped [ when you meant to escape .
The Bug
# WRONG - escaping the bracket, not the dot
grep '[0-9]*.\[0-9]*' /tmp/gotchas.txt
The \[ escapes the bracket. The . is STILL matching any character.
The Fix
# CORRECT - escape the DOT
grep '[0-9]*\.[0-9]*' /tmp/gotchas.txt
Rule
When matching IP addresses:
- Escape: \. (the period)
- Don’t escape: [0-9] (the character class brackets)
TRAP 3: * Matches Zero (Empty String)
* means "zero or more" - it happily matches nothing.
The Bug
# WRONG - [0-9]* can match empty string
grep -E '^[0-9]*$' /tmp/gotchas.txt
This matches empty lines too because * allows zero digits.
The Fix
# CORRECT - use + for "one or more"
grep -E '^[0-9]+$' /tmp/gotchas.txt
Comparison
| Quantifier | Meaning | Matches ""? |
|---|---|---|
|
Zero or more |
YES |
|
One or more |
NO |
|
Zero or one |
YES |
|
One to three |
NO |
TRAP 4: Greedy Matching
Default quantifiers grab as MUCH as possible.
The Bug
# WRONG - greedy .* grabs everything
echo '<div>one</div><div>two</div>' | grep -oP '<div>.*</div>'
Output
<div>one</div><div>two</div>
One match containing BOTH divs!
The Fix
# CORRECT - lazy .*? matches minimum
echo '<div>one</div><div>two</div>' | grep -oP '<div>.*?</div>'
Output
<div>one</div> <div>two</div>
Two separate matches.
Alternative Fix (No PCRE)
# Use negated character class instead
echo '<div>one</div><div>two</div>' | grep -oE '<div>[^<]*</div>'
[^<]* = any character EXCEPT <, so it stops at </div>
TRAP 5: $ Means End of Line
TRAP 6: ^ Inside vs Outside []
^ has two meanings depending on context.
Outside []: Start of Line
grep '^Price' /tmp/gotchas.txt # Lines starting with "Price"
Inside [] at Start: Negation
grep '[^0-9]' /tmp/gotchas.txt # Any character EXCEPT digits
TRAP 7: BRE vs ERE Escaping
Some characters need escaping in BRE but not ERE.
| Character | BRE (grep) | ERE (grep -E) |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The Bug
# WRONG - + needs escaping in BRE
grep '[0-9]+' /tmp/gotchas.txt
This looks for literal + character, not "one or more digits".
The Fixes
# Option 1: BRE with escaped +
grep '[0-9]\+' /tmp/gotchas.txt
# Option 2: ERE (preferred)
grep -E '[0-9]+' /tmp/gotchas.txt
Always use -E for modern regex work. BRE escaping is confusing.
|
TRAP 9: Shell Expansion
Quick Diagnostic
When your regex doesn’t work, check these IN ORDER:
-
Unescaped metacharacters? (
.*+?$^[](){}|\) -
Wrong flavor? (BRE vs ERE vs PCRE)
-
Greedy trap? (need
.?instead of.) -
Shell expansion? (use single quotes)
-
Need
-o? (for extraction) -
Need
-P? (for\b,\d, lookaround)
Drill: Find the Bug
For each broken command, identify the problem:
Bug 1
grep '192.168' file # Why does this match "192X168"?
Answer
Unescaped dot. Fix: grep '192\.168' file
Bug 2
grep '[0-9]+' file # Why doesn't this match digits?
Answer
BRE mode - ` is literal. Fix: `grep -E '[0-9]' file or grep '[0-9]\+' file
Bug 3
grep '\berror\b' file # Why doesn't word boundary work?
Answer
\b only works in PCRE. Fix: grep -P '\berror\b' file or grep -w 'error' file
Bug 4
grep "Price: $99" file # Why doesn't this match?
Answer
Double quotes allow shell expansion. $99 becomes $9 + 9. Fix: grep 'Price: \$99' file
Bug 5
grep '<.*>' file # Why does this return too much?
Answer
Greedy matching. . grabs everything. Fix: grep -P '<.?>' file or grep '<[^>]*>' file
Your IP Mistake Dissected
What you wrote:
grep '[0-9]*.\[0-9]*.\[0-9]*.\[0-9]*' /tmp/fundamentals.txt
Problems:
| Issue | What You Wrote | What It Does |
|---|---|---|
Unescaped dot |
|
Matches ANY character |
Wrong escape |
|
Escapes the bracket (useless) |
|
|
Matches zero or more (empty OK) |
Fix progression:
# Level 1: Escape the dots
grep '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*' /tmp/fundamentals.txt
# Level 2: Use + instead of *
grep '[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+' /tmp/fundamentals.txt
# Level 3: ERE syntax
grep -E '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' /tmp/fundamentals.txt
# Level 4: Limit octet length
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/fundamentals.txt
# Level 5: PCRE with \d shortcut
grep -oP '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' /tmp/fundamentals.txt
You’re on day 2. This mistake is NORMAL. The fact that you caught it means you’re learning.