Drill 04: Anchors & Boundaries

Anchors match positions, not characters. Use them to find patterns at line starts, line ends, or word boundaries without consuming any text.

Core Concepts

Anchor Meaning Tool Support

^

Start of line/string

All tools

$

End of line/string

All tools

\b

Word boundary

PCRE (grep -P), Python

\B

NOT a word boundary

PCRE, Python

\<

Start of word

BRE/ERE (grep, sed)

\>

End of word

BRE/ERE (grep, sed)

\A

Start of string (ignores multiline)

PCRE, Python

\Z

End of string (ignores multiline)

PCRE, Python

Zero-Width Assertion

Key insight: Anchors match POSITIONS, not characters. They have zero width.

echo "hello" | grep -o '^'
# Output: (empty - matched position before 'h')

echo "hello" | grep -o '^.'
# Output: h (position before h, then one character)

Interactive CLI Drill

bash ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/examples/regex-drills/04-anchors.sh

Exercise Set 1: Line Anchors

cat << 'EOF' > /tmp/ex-anchor.txt
ERROR: Connection failed
Warning: Low disk
INFO: Process started
  ERROR: indented error
error: lowercase error
Process completed with ERROR
EOF

Ex 1.1: Lines starting with ERROR

Solution
grep '^ERROR' /tmp/ex-anchor.txt

Output: ERROR: Connection failed (Not the indented or lowercase ones)

Ex 1.2: Lines ending with ERROR

Solution
grep 'ERROR$' /tmp/ex-anchor.txt

Output: Process completed with ERROR

Ex 1.3: Lines with ERROR anywhere (no anchor)

Solution
grep 'ERROR' /tmp/ex-anchor.txt

Output: All lines containing ERROR (3 lines)

Ex 1.4: Empty lines only

Solution
grep '^$' file.txt

^$ = start immediately followed by end = empty line.

Exercise Set 2: Word Boundaries

cat << 'EOF' > /tmp/ex-words.txt
port 80
port 443
port 8080
transport layer
export PATH
import os
reported error
portability issues
sportsman
EOF

Ex 2.1: Match "port" as whole word only

Solution
# PCRE with \b
grep -P '\bport\b' /tmp/ex-words.txt

# BRE/ERE with \< \>
grep '\<port\>' /tmp/ex-words.txt

Output: port 80, port 443, port 8080 (NOT transport, export, reported, portability, sportsman)

Ex 2.2: Words starting with "port"

Solution
grep -P '\bport' /tmp/ex-words.txt
# Or: grep '\<port' /tmp/ex-words.txt

Output: port 80, port 443, port 8080, portability

Ex 2.3: Words ending with "port"

Solution
grep -P 'port\b' /tmp/ex-words.txt
# Or: grep 'port\>' /tmp/ex-words.txt

Output: port 80, port 443, port 8080, transport, export

Ex 2.4: "port" NOT as whole word

Solution
# Using \B (NOT word boundary)
grep -P '\Bport\B' /tmp/ex-words.txt

Output: Lines where "port" is inside a word (reported, sportsman)

Exercise Set 3: Combined Patterns

cat << 'EOF' > /tmp/ex-combined.txt
192.168.1.100
host: server-01
HOST: server-02
Server: 10.50.1.50
# This is a comment
  # Indented comment
key=value
key = value
EOF

Ex 3.1: Lines starting with IP address

Solution
grep -E '^[0-9]{1,3}\.' /tmp/ex-combined.txt

Output: 192.168.1.100

Ex 3.2: Comment lines (# at start, with optional spaces)

Solution
grep -E '^ *#' /tmp/ex-combined.txt

^ *# = start, zero or more spaces, then #

Ex 3.3: Key-value pairs (key at line start)

Solution
grep -E '^[a-z]+\s*=' /tmp/ex-combined.txt

Output: key=value, key = value

Ex 3.4: Lines NOT starting with #

Solution
grep -v '^#' /tmp/ex-combined.txt
# Or with pattern: grep -E '^[^#]' /tmp/ex-combined.txt

Exercise Set 4: Multiline Context

Ex 4.1: Whole line match

Solution
# Match exact line "key=value"
grep -x 'key=value' /tmp/ex-combined.txt
# -x is equivalent to: grep '^key=value$' /tmp/ex-combined.txt

Ex 4.2: Python multiline mode

Solution
import re

text = """Line one
Line two
Line three"""

# Without MULTILINE: ^ and $ match string start/end
pattern = re.compile(r'^Line')
print(pattern.findall(text))  # ['Line'] - only first

# With MULTILINE: ^ and $ match line start/end
pattern = re.compile(r'^Line', re.MULTILINE)
print(pattern.findall(text))  # ['Line', 'Line', 'Line']

Real-World Applications

Professional: Find Config Directives

# Apache/Nginx directives at line start
grep -E '^(Listen|ServerName|root|server_name)' /etc/nginx/nginx.conf

# SSH config options
grep -E '^\s*(PermitRootLogin|PasswordAuthentication)' /etc/ssh/sshd_config

Professional: Log Analysis

# Lines starting with timestamp
grep -E '^[0-9]{4}-[0-9]{2}-[0-9]{2}' /var/log/app.log

# Error lines at start of entry
grep -E '^\[ERROR\]' /var/log/app.log

# Lines ending with status codes
grep -E '(200|404|500)$' /var/log/nginx/access.log

Professional: ISE Patterns

# Lines starting with MAC address
grep -Ei '^[0-9a-f]{2}:' /var/log/ise-psc.log

# Match "Passed" as word (not PassedAuthentication)
grep -P '\bPassed\b' /var/log/ise-psc.log

Personal: Note Searching

# Find TODO items at line start
grep -ri '^TODO:' ~/notes/

# Find headings (markdown)
grep -E '^#{1,3} ' ~/notes/*.md

# Find AsciiDoc section titles
grep -E '^=+ ' ~/docs/*.adoc

Personal: List Items

# Bullet points
grep -E '^[*-] ' ~/notes/*.md

# Numbered items
grep -E '^[0-9]+\. ' ~/notes/*.md

# Checkbox items
grep -E '^\s*\[ \]' ~/notes/*.md

Personal: Journal Entries

# Date headers
grep -E '^[0-9]{4}-[0-9]{2}-[0-9]{2}' ~/journal/*.md

# Time entries
grep -E '^[0-9]{2}:[0-9]{2}' ~/journal/*.md

Tool Variants

grep: Anchor Usage

# Case-insensitive whole word
grep -wi 'error' file.txt  # -w is like \b...\b

# Invert match: lines NOT starting with #
grep -v '^#' config.txt

# Count lines starting with pattern
grep -c '^ERROR' file.txt

sed: Anchored Substitution

# Remove leading whitespace
sed 's/^[[:space:]]*//' file.txt

# Remove trailing whitespace
sed 's/[[:space:]]*$//' file.txt

# Add prefix to each line
sed 's/^/PREFIX: /' file.txt

# Add suffix to each line
sed 's/$/ # comment/' file.txt

# Comment out lines starting with keyword
sed 's/^DEBUG/# DEBUG/' file.txt

awk: Line Position

# Lines starting with pattern
awk '/^ERROR/' file.txt

# Lines ending with pattern
awk '/failed$/' file.txt

# Word boundaries (GNU awk)
awk '/\<port\>/' file.txt

# Anchor with field check
awk '$1 ~ /^[0-9]/' file.txt  # Field 1 starts with digit

vim: Anchor Patterns

" Find lines starting with ERROR
/^ERROR

" Find lines ending with semicolon
/;$

" Find word "port" (not transport)
/\<port\>

" Delete empty lines
:g/^$/d

" Delete lines starting with #
:g/^#/d

" Add text at line end
:%s/$/ # end/

Python: Anchors and Flags

import re

text = """ERROR: First line
INFO: Second line
ERROR: Third line"""

# Default: ^ matches string start only
pattern = re.compile(r'^ERROR')
matches = pattern.findall(text)
print(matches)  # ['ERROR'] - only first

# MULTILINE: ^ matches each line start
pattern = re.compile(r'^ERROR', re.MULTILINE)
matches = pattern.findall(text)
print(matches)  # ['ERROR', 'ERROR']

# Word boundaries
text = "port export transport"
pattern = re.compile(r'\bport\b')
matches = pattern.findall(text)
print(matches)  # ['port'] - only whole word

Gotchas

^ Inside Character Class

# ^ at START of class = negation
echo "abc123" | grep -o '[^0-9]+'
# Output: abc (NOT digits)

# ^ elsewhere = literal caret
echo "a^b" | grep -o '[a^b]'
# Output: a, ^, b (matches caret literally)

$ in Shell Strings

# WRONG: Shell interprets $
grep "pattern$" file.txt  # $ might be expanded

# CORRECT: Use single quotes
grep 'pattern$' file.txt

# Or escape it
grep "pattern\$" file.txt

Word Boundary Definition

# Word = [A-Za-z0-9_] sequence
# Boundary = transition between word and non-word

echo "user_name" | grep -Po '\buser\b'
# No match! Underscore is a word character

echo "user-name" | grep -Po '\buser\b'
# Match! Hyphen is NOT a word character

BRE vs PCRE Word Boundaries

# BRE/ERE: Use \< and \>
grep '\<word\>' file.txt

# PCRE: Use \b
grep -P '\bword\b' file.txt

# Both work, but \b is more portable to Python/JavaScript

Key Takeaways

Anchor Use Case

^

Match at line/string start

$

Match at line/string end

^…​$

Match entire line exactly

\b

Word boundary (PCRE)

\< \>

Word boundaries (BRE/ERE)

\B

NOT a word boundary

-w (grep)

Shortcut for \b…​\b

re.MULTILINE

Make ^ and $ match line boundaries

Self-Test

  1. What does ^ERROR$ match?

  2. What’s the difference between [] and ^?

  3. How do you match "port" but not "export" or "transport"?

  4. What grep flag is equivalent to \b…​\b?

  5. In Python, what flag makes ^ match line starts?

Answers
  1. A line containing ONLY "ERROR" (nothing else)

  2. [] = lines NOT starting with #; ^ = lines starting with #

  3. \bport\b (PCRE) or \<port\> (BRE/ERE) or grep -w port

  4. -w (word match)

  5. re.MULTILINE or re.M

Next Drill

Drill 05: Groups & Backreferences - Master (), \1, (?:), and named captures.