Drill 01: Fundamentals

Before anything else, master these basics: what matches literally, what’s special, and how to escape.

Core Concepts

Concept What You Need to Know

Literal matching

Most characters match themselves: foo matches "foo"

Metacharacters

Special characters with regex meaning: . * + ? ^ $ | ( ) [ ] { } \

Escaping

Use backslash to match metacharacters literally: \. matches a period

Case sensitivity

Regex is case-sensitive by default; use flags to change

Metacharacter Reference

Char Meaning To Match Literally

.

Any single character (except newline)

\.

*

Zero or more of previous

\*

+

One or more of previous

\+

?

Zero or one of previous

\?

^

Start of line/string

\^

$

End of line/string

\$

|

Alternation (OR)

\|

(

Group start

\(

)

Group end

\)

[

Character class start

\[

]

Character class end

\]

{

Quantifier start

\{

}

Quantifier end

\}

\

Escape character

\\

Interactive CLI Drill

Run the practice script:

bash ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/examples/regex-drills/01-fundamentals.sh

Exercise Set 1: Literal Matching

cat << 'EOF' > /tmp/ex01.txt
The server responded with 200 OK
Error: Connection failed
Warning: Low disk space
server.example.com
10.50.1.20
EOF

Ex 1.1: Match "server"

Solution
grep 'server' /tmp/ex01.txt

Output:

The server responded with 200 OK
server.example.com

Ex 1.2: Case-insensitive "error"

Solution
grep -i 'error' /tmp/ex01.txt

Output:

Error: Connection failed

Ex 1.3: Match exact IP

Solution
grep '10\.50\.1\.20' /tmp/ex01.txt

Note: Escape periods to match literally, not "any character"

Exercise Set 2: The Dot Trap

cat << 'EOF' > /tmp/ex02.txt
192.168.1.100
192X168Y1Z100
10.50.1.20
version1.2.3
versionABC
EOF

Ex 2.1: Why does this match both?

grep -E '192.168' /tmp/ex02.txt
Explanation

The unescaped . matches ANY character: - 192.168 matches "192.168" (dot is any char, happens to be period) - 192.168 also matches "192X168" (dot matches X)

This is the #1 regex bug. Always escape dots in IPs, domains, versions.

Ex 2.2: Match only valid IP format

Solution
grep -E '192\.168\.1\.100' /tmp/ex02.txt

Exercise Set 3: Escaping Special Characters

cat << 'EOF' > /tmp/ex03.txt
[ERROR] Database offline
[WARN] Low memory
Price: $99.99
Path: C:\Users\Admin
What? Really? Yes!
2+2=4
5*5=25
EOF

Ex 3.1: Match [ERROR] including brackets

Solution
grep -E '\[ERROR\]' /tmp/ex03.txt

Ex 3.2: Match the $99.99 price

Solution
grep -E '\$99\.99' /tmp/ex03.txt

Ex 3.3: Match Windows path

Solution
grep -E 'C:\\Users' /tmp/ex03.txt

Note: In shell, \\ becomes one backslash passed to grep

Ex 3.4: Match "2+2=4"

Solution
grep -E '2\+2=4' /tmp/ex03.txt

Real-World Applications

Professional: Finding Log Levels

# Match [INFO], [WARN], [ERROR] in logs
grep -E '\[(INFO|WARN|ERROR)\]' /var/log/app.log

Professional: ISE Authentication Logs

# Match Passed-Authentication or Failed-Attempt
grep -E 'Passed-Authentication|Failed-Attempt' /var/log/ise-psc.log

Personal: Find Prices in Documents

# Find any dollar amounts
grep -Eo '\$[0-9,]+(\.[0-9]{2})?' ~/Documents/receipts/*.txt

Personal: Search for URLs

# Find URLs (escape the dots and slashes)
grep -Eo 'https?://[a-zA-Z0-9\./\-]+' ~/notes/*.md

Tool Variants

sed: Escape for Substitution

# Replace [ERROR] with [CRITICAL]
echo '[ERROR] System failure' | sed 's/\[ERROR\]/[CRITICAL]/'

awk: Double Escaping

# In awk, backslashes need extra escaping
echo '192.168.1.100' | awk '/192\.168\.1\.100/ {print "Found IP"}'

vim: Same Escaping Rules

# Search for [ERROR]
/\[ERROR\]

# Substitute $99 with $49
:%s/\$99/\$49/g

Python: Raw Strings Help

import re

# Use r'' raw strings to avoid double-escaping
pattern = r'\[ERROR\]'
text = '[ERROR] Connection failed'
if re.search(pattern, text):
    print('Found error')

Key Takeaways

Concept Remember

Dot .

Matches ANY character. Always escape in IPs, domains, filenames.

Brackets []

Define character classes. Escape with \[ and \] for literal.

Dollar $

End anchor. Escape with \$ for currency.

Question ?

Optional quantifier. Escape with \? for literal question marks.

Backslash \

Escape character. Use \\ to match a literal backslash.

Common Mistakes

Mistake Wrong Correct

Unescaped dot in IP

192.168.1.1

192\.168\.1\.1

Unescaped brackets

[ERROR]

\[ERROR\]

Forgetting shell quoting

grep $99 file

grep '\$99' file

Windows paths

C:\Users

C:\\Users (in shell)

Self-Test

Answer these without looking:

  1. What does . match in regex?

  2. List 5 metacharacters that need escaping

  3. How do you match a literal backslash?

  4. What flag makes matching case-insensitive?

  5. Why is 192.168.1.1 a BAD pattern for matching IPs?

Answers
  1. Any single character except newline

  2. . * + ? ^ $ | ( ) [ ] { } \ (any 5)

  3. \\

  4. -i flag in grep, re.IGNORECASE in Python

  5. The dots match ANY character, so it would also match "192X168Y1Z1"

Next Drill

Drill 02: Character Classes - Learn [abc], [0-9], and PCRE shorthand.