Drill 04: Anchors & Boundaries
Anchors match positions, not characters. Use them to find patterns at line starts, line ends, or word boundaries without consuming any text.
Core Concepts
| Anchor | Meaning | Tool Support |
|---|---|---|
|
Start of line/string |
All tools |
|
End of line/string |
All tools |
|
Word boundary |
PCRE (grep -P), Python |
|
NOT a word boundary |
PCRE, Python |
|
Start of word |
BRE/ERE (grep, sed) |
|
End of word |
BRE/ERE (grep, sed) |
|
Start of string (ignores multiline) |
PCRE, Python |
|
End of string (ignores multiline) |
PCRE, Python |
Zero-Width Assertion
Key insight: Anchors match POSITIONS, not characters. They have zero width.
echo "hello" | grep -o '^'
# Output: (empty - matched position before 'h')
echo "hello" | grep -o '^.'
# Output: h (position before h, then one character)
Interactive CLI Drill
bash ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/examples/regex-drills/04-anchors.sh
Exercise Set 1: Line Anchors
cat << 'EOF' > /tmp/ex-anchor.txt
ERROR: Connection failed
Warning: Low disk
INFO: Process started
ERROR: indented error
error: lowercase error
Process completed with ERROR
EOF
Ex 1.1: Lines starting with ERROR
Solution
grep '^ERROR' /tmp/ex-anchor.txt
Output: ERROR: Connection failed
(Not the indented or lowercase ones)
Ex 1.2: Lines ending with ERROR
Solution
grep 'ERROR$' /tmp/ex-anchor.txt
Output: Process completed with ERROR
Ex 1.3: Lines with ERROR anywhere (no anchor)
Solution
grep 'ERROR' /tmp/ex-anchor.txt
Output: All lines containing ERROR (3 lines)
Ex 1.4: Empty lines only
Solution
grep '^$' file.txt
^$ = start immediately followed by end = empty line.
Exercise Set 2: Word Boundaries
cat << 'EOF' > /tmp/ex-words.txt
port 80
port 443
port 8080
transport layer
export PATH
import os
reported error
portability issues
sportsman
EOF
Ex 2.1: Match "port" as whole word only
Solution
# PCRE with \b
grep -P '\bport\b' /tmp/ex-words.txt
# BRE/ERE with \< \>
grep '\<port\>' /tmp/ex-words.txt
Output: port 80, port 443, port 8080
(NOT transport, export, reported, portability, sportsman)
Ex 2.2: Words starting with "port"
Solution
grep -P '\bport' /tmp/ex-words.txt
# Or: grep '\<port' /tmp/ex-words.txt
Output: port 80, port 443, port 8080, portability
Ex 2.3: Words ending with "port"
Solution
grep -P 'port\b' /tmp/ex-words.txt
# Or: grep 'port\>' /tmp/ex-words.txt
Output: port 80, port 443, port 8080, transport, export
Ex 2.4: "port" NOT as whole word
Solution
# Using \B (NOT word boundary)
grep -P '\Bport\B' /tmp/ex-words.txt
Output: Lines where "port" is inside a word (reported, sportsman)
Exercise Set 3: Combined Patterns
cat << 'EOF' > /tmp/ex-combined.txt
192.168.1.100
host: server-01
HOST: server-02
Server: 10.50.1.50
# This is a comment
# Indented comment
key=value
key = value
EOF
Ex 3.1: Lines starting with IP address
Solution
grep -E '^[0-9]{1,3}\.' /tmp/ex-combined.txt
Output: 192.168.1.100
Ex 3.2: Comment lines (# at start, with optional spaces)
Solution
grep -E '^ *#' /tmp/ex-combined.txt
^ *# = start, zero or more spaces, then #
Ex 3.3: Key-value pairs (key at line start)
Solution
grep -E '^[a-z]+\s*=' /tmp/ex-combined.txt
Output: key=value, key = value
Ex 3.4: Lines NOT starting with #
Solution
grep -v '^#' /tmp/ex-combined.txt
# Or with pattern: grep -E '^[^#]' /tmp/ex-combined.txt
Exercise Set 4: Multiline Context
Ex 4.1: Whole line match
Solution
# Match exact line "key=value"
grep -x 'key=value' /tmp/ex-combined.txt
# -x is equivalent to: grep '^key=value$' /tmp/ex-combined.txt
Ex 4.2: Python multiline mode
Solution
import re
text = """Line one
Line two
Line three"""
# Without MULTILINE: ^ and $ match string start/end
pattern = re.compile(r'^Line')
print(pattern.findall(text)) # ['Line'] - only first
# With MULTILINE: ^ and $ match line start/end
pattern = re.compile(r'^Line', re.MULTILINE)
print(pattern.findall(text)) # ['Line', 'Line', 'Line']
Real-World Applications
Professional: Find Config Directives
# Apache/Nginx directives at line start
grep -E '^(Listen|ServerName|root|server_name)' /etc/nginx/nginx.conf
# SSH config options
grep -E '^\s*(PermitRootLogin|PasswordAuthentication)' /etc/ssh/sshd_config
Professional: Log Analysis
# Lines starting with timestamp
grep -E '^[0-9]{4}-[0-9]{2}-[0-9]{2}' /var/log/app.log
# Error lines at start of entry
grep -E '^\[ERROR\]' /var/log/app.log
# Lines ending with status codes
grep -E '(200|404|500)$' /var/log/nginx/access.log
Professional: ISE Patterns
# Lines starting with MAC address
grep -Ei '^[0-9a-f]{2}:' /var/log/ise-psc.log
# Match "Passed" as word (not PassedAuthentication)
grep -P '\bPassed\b' /var/log/ise-psc.log
Personal: Note Searching
# Find TODO items at line start
grep -ri '^TODO:' ~/notes/
# Find headings (markdown)
grep -E '^#{1,3} ' ~/notes/*.md
# Find AsciiDoc section titles
grep -E '^=+ ' ~/docs/*.adoc
Personal: List Items
# Bullet points
grep -E '^[*-] ' ~/notes/*.md
# Numbered items
grep -E '^[0-9]+\. ' ~/notes/*.md
# Checkbox items
grep -E '^\s*\[ \]' ~/notes/*.md
Personal: Journal Entries
# Date headers
grep -E '^[0-9]{4}-[0-9]{2}-[0-9]{2}' ~/journal/*.md
# Time entries
grep -E '^[0-9]{2}:[0-9]{2}' ~/journal/*.md
Tool Variants
grep: Anchor Usage
# Case-insensitive whole word
grep -wi 'error' file.txt # -w is like \b...\b
# Invert match: lines NOT starting with #
grep -v '^#' config.txt
# Count lines starting with pattern
grep -c '^ERROR' file.txt
sed: Anchored Substitution
# Remove leading whitespace
sed 's/^[[:space:]]*//' file.txt
# Remove trailing whitespace
sed 's/[[:space:]]*$//' file.txt
# Add prefix to each line
sed 's/^/PREFIX: /' file.txt
# Add suffix to each line
sed 's/$/ # comment/' file.txt
# Comment out lines starting with keyword
sed 's/^DEBUG/# DEBUG/' file.txt
awk: Line Position
# Lines starting with pattern
awk '/^ERROR/' file.txt
# Lines ending with pattern
awk '/failed$/' file.txt
# Word boundaries (GNU awk)
awk '/\<port\>/' file.txt
# Anchor with field check
awk '$1 ~ /^[0-9]/' file.txt # Field 1 starts with digit
vim: Anchor Patterns
" Find lines starting with ERROR /^ERROR " Find lines ending with semicolon /;$ " Find word "port" (not transport) /\<port\> " Delete empty lines :g/^$/d " Delete lines starting with # :g/^#/d " Add text at line end :%s/$/ # end/
Python: Anchors and Flags
import re
text = """ERROR: First line
INFO: Second line
ERROR: Third line"""
# Default: ^ matches string start only
pattern = re.compile(r'^ERROR')
matches = pattern.findall(text)
print(matches) # ['ERROR'] - only first
# MULTILINE: ^ matches each line start
pattern = re.compile(r'^ERROR', re.MULTILINE)
matches = pattern.findall(text)
print(matches) # ['ERROR', 'ERROR']
# Word boundaries
text = "port export transport"
pattern = re.compile(r'\bport\b')
matches = pattern.findall(text)
print(matches) # ['port'] - only whole word
Gotchas
^ Inside Character Class
# ^ at START of class = negation
echo "abc123" | grep -o '[^0-9]+'
# Output: abc (NOT digits)
# ^ elsewhere = literal caret
echo "a^b" | grep -o '[a^b]'
# Output: a, ^, b (matches caret literally)
$ in Shell Strings
# WRONG: Shell interprets $
grep "pattern$" file.txt # $ might be expanded
# CORRECT: Use single quotes
grep 'pattern$' file.txt
# Or escape it
grep "pattern\$" file.txt
Word Boundary Definition
# Word = [A-Za-z0-9_] sequence
# Boundary = transition between word and non-word
echo "user_name" | grep -Po '\buser\b'
# No match! Underscore is a word character
echo "user-name" | grep -Po '\buser\b'
# Match! Hyphen is NOT a word character
BRE vs PCRE Word Boundaries
# BRE/ERE: Use \< and \>
grep '\<word\>' file.txt
# PCRE: Use \b
grep -P '\bword\b' file.txt
# Both work, but \b is more portable to Python/JavaScript
Key Takeaways
| Anchor | Use Case |
|---|---|
|
Match at line/string start |
|
Match at line/string end |
|
Match entire line exactly |
|
Word boundary (PCRE) |
|
Word boundaries (BRE/ERE) |
|
NOT a word boundary |
|
Shortcut for |
|
Make ^ and $ match line boundaries |
Self-Test
-
What does
^ERROR$match? -
What’s the difference between
[]and^? -
How do you match "port" but not "export" or "transport"?
-
What grep flag is equivalent to
\b…\b? -
In Python, what flag makes
^match line starts?
Answers
-
A line containing ONLY "ERROR" (nothing else)
-
[]= lines NOT starting with #;^= lines starting with # -
\bport\b(PCRE) or\<port\>(BRE/ERE) orgrep -w port -
-w(word match) -
re.MULTILINEorre.M
Next Drill
Drill 05: Groups & Backreferences - Master (), \1, (?:), and named captures.