Anchors & Boundaries
Anchors match positions in the text, not characters. They assert that the current position meets a condition without consuming any characters. Essential for precise pattern matching.
Line Anchors
Start of Line ^
Matches the position at the start of a line.
Pattern: ^Error
Text: Error: Connection failed ← Matches
The Error was critical ← Does NOT match
Infrastructure Example:
# Find lines starting with ERROR
grep '^ERROR' /var/log/syslog
# Find configuration comments
grep '^#' /etc/ssh/sshd_config
# Find lines starting with IP
grep -E '^[0-9]{1,3}\.' access.log
End of Line $
Matches the position at the end of a line.
Pattern: failed$
Text: Authentication failed ← Matches
failed authentication ← Does NOT match
Infrastructure Example:
# Find lines ending with .log
grep '\.log$' filelist.txt
# Find lines ending with port number
grep -E ':[0-9]+$' connections.txt
# Find lines ending with OK
grep 'OK$' status.log
Combining ^ and $
Pattern: ^ERROR$
Matches: Lines containing ONLY "ERROR"
Pattern: ^$
Matches: Empty lines
Pattern: ^.+$
Matches: Non-empty lines
Practical Examples:
# Remove empty lines
grep -v '^$' file.txt
# Find lines with only whitespace
grep -E '^\s*$' file.txt
# Find exact match on line
grep -x 'PATTERN' file.txt # Equivalent to ^PATTERN$
Word Boundaries
The \b Anchor
Matches the boundary between a word character (\w) and non-word character.
Pattern: \bcat\b
Text: The cat sat on the caterpillar.
Matches: ^^^
Does NOT match: caterpillar (no boundary after 'cat')
What constitutes a word boundary:
- Start of string (if first char is \w)
- End of string (if last char is \w)
- Between \w and \W
- Between \W and \w
Infrastructure Examples
# Match "log" as whole word
grep -P '\blog\b' /etc/rsyslog.conf
# Matches: log, log.txt
# NOT: logging, catalog, syslog
# Match VLAN as whole word
grep -P '\bVLAN\b' config.txt
# Match specific port number
grep -P '\b443\b' firewall.log
# Matches: 443, :443
# NOT: 4430, 14435
Word Start \< and Word End \>
BRE/ERE equivalent of \b (directional):
# Word starting with "log"
grep '\<log' /etc/rsyslog.conf
# Matches: log, logging, logout
# NOT: catalog, syslog
# Word ending with "log"
grep 'log\>' /etc/rsyslog.conf
# Matches: log, syslog, catalog
# NOT: logging, logout
# Exact word (combine both)
grep '\<log\>' /etc/rsyslog.conf
# Matches only: log
String Anchors (PCRE)
These match the absolute start/end of the entire string, not lines:
| Anchor | Meaning | Notes |
|---|---|---|
|
Start of string |
Ignores multiline mode |
|
End of string (before final newline) |
Ignores multiline mode |
|
Absolute end of string |
Including any trailing newline |
Difference from ^ and $:
- In multiline mode, ^ and $ match at every line
- \A and \z always match only string start/end
import re
text = """Line 1
Line 2
Line 3"""
# ^ matches each line start
re.findall(r'^Line', text, re.MULTILINE)
# ['Line', 'Line', 'Line']
# \A matches only string start
re.findall(r'\ALine', text, re.MULTILINE)
# ['Line'] (only first)
Non-Word Boundary \B
Matches where \b would NOT match (inside a word).
Pattern: \Bcat\B
Text: The cat and the caterpillar scatter
Matches: ^^^
(only 'cat' inside 'scatter')
Use case: Finding substrings within words.
Practical Patterns
Validate Full Line Content
# Line is a valid IPv4 (basic validation)
grep -E '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$' ips.txt
# Line contains only hex characters
grep -E '^[A-Fa-f0-9]+$' hashes.txt
# Line is a valid MAC address
grep -E '^([A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}$' macs.txt
Extract from Fixed Positions
# Get first word of each line
grep -oE '^\S+' file.txt
# Get last word of each line
grep -oE '\S+$' file.txt
# Get timestamp at line start
grep -oE '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9:]+' log.txt
Log Parsing
# Lines starting with timestamp
grep -E '^[0-9]{4}-[0-9]{2}-[0-9]{2}' application.log
# Lines starting with log level
grep -E '^(DEBUG|INFO|WARN|ERROR|FATAL)' application.log
# Lines ending with error indicators
grep -E '(failed|error|refused|denied)$' application.log
Configuration Validation
# Find uncommented settings (non-comment lines with '=')
grep -E '^[^#].*=' config.ini
# Find active port definitions
grep -E '^port\s*=' config.ini
# Find blank or comment-only lines
grep -E '^(\s*#.*|\s*)$' config.ini
Multiline Mode
In PCRE, multiline mode changes ^ and $ behavior:
import re
text = """Line 1
Line 2
Line 3"""
# Without MULTILINE: ^ matches only start of string
re.findall(r'^Line \d', text)
# ['Line 1']
# With MULTILINE: ^ matches start of each line
re.findall(r'^Line \d', text, re.MULTILINE)
# ['Line 1', 'Line 2', 'Line 3']
In grep: Always line-based, so ^ and $ work per-line by default.
Self-Test Exercises
| Try each challenge FIRST. Only expand the answer after you’ve attempted it. |
Setup Test Data
cat << 'EOF' > /tmp/anchors.txt
ERROR Connection failed
Warning: disk space low
error: permission denied
VLAN 100
VLAN100
catalog
log
syslog
logging
192.168.1.1
10.50.1.100
#comment
# indented comment
setting=value
# disabled=old_value
active_setting = true
Port: 443
Port: 8080
auth.log
syslog.log
authentication failed
connection failed
failed connection
EOF
Challenge 1: Lines Starting with ERROR
Goal: Find lines that START with "ERROR" (uppercase)
Answer
grep '^ERROR' /tmp/anchors.txt
^ matches start of line. Only uppercase ERROR at line start matches.
Challenge 2: Lines Ending with "failed"
Goal: Find lines that END with the word "failed"
Answer
grep 'failed$' /tmp/anchors.txt
$ matches end of line. Output: "authentication failed", "connection failed"
Challenge 3: Whole Word "log"
Goal: Match "log" as a complete word (not "catalog", "syslog", "logging")
Answer
# BRE/ERE word boundaries
grep '\<log\>' /tmp/anchors.txt
# PCRE word boundaries
grep -P '\blog\b' /tmp/anchors.txt
# Or use -w flag
grep -w 'log' /tmp/anchors.txt
\b or \<\> ensure word boundaries. Output: only "log"
Challenge 4: Non-Comment Config Lines
Goal: Find configuration lines (contain =) that are NOT comments
Answer
grep -E '^[^#].*=' /tmp/anchors.txt
[] means line starts with anything EXCEPT . Then .*= finds the equals sign.
Challenge 5: Empty Lines
Goal: Find empty lines (lines with nothing on them)
Answer
grep '^$' /tmp/anchors.txt
^$ means start immediately followed by end = empty line.
Challenge 6: Lines That Are ONLY an IP Address
Goal: Find lines where the ENTIRE line is just an IP address
Answer
grep -E '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$' /tmp/anchors.txt
^…$ anchors the pattern to the entire line. No extra text allowed.
Challenge 7: Files Ending in .log
Goal: Find lines that end with ".log"
Answer
grep '\.log$' /tmp/anchors.txt
\. escapes the dot (literal). $ anchors to line end.
Challenge 8: Lines Starting with Whitespace
Goal: Find lines that begin with whitespace (space or tab)
Answer
Challenge 9: Word Starting with "log"
Goal: Match words STARTING with "log" (log, logging) but not ending with it (catalog, syslog)
Answer
grep '\<log' /tmp/anchors.txt
\< is word START boundary only. Matches: log, logging.
Challenge 10: Word Ending with "log"
Goal: Match words ENDING with "log" (log, syslog, catalog) but not starting with it (logging)
Answer
grep 'log\>' /tmp/anchors.txt
\> is word END boundary only. Matches: log, syslog, catalog.
Challenge 11: Comments (# at Line Start)
Goal: Find comment lines (# at start of line, ignoring whitespace)
Answer
# Strict - # must be first character
grep '^#' /tmp/anchors.txt
# With leading whitespace allowed
grep -E '^\s*#' /tmp/anchors.txt
^\s*# means: start, zero or more whitespace, then #.
Challenge 12: VLAN with Word Boundary
Goal: Match "VLAN 100" but not "VLAN100" (space required)
Answer
grep -E '\bVLAN [0-9]+\b' /tmp/anchors.txt
# Or
grep -E 'VLAN [0-9]+' /tmp/anchors.txt
The space in the pattern ensures separation.
Common Mistakes
Mistake 1: Confusing ^ Inside vs Outside []
^[abc] → Line starting with a, b, or c
[^abc] → Character that is NOT a, b, or c
These are completely different!
Mistake 2: Forgetting $ Includes Newline
Pattern: test$
Matches "test" at end of line, INCLUDING before the newline character.
Mistake 3: Word Boundary Misunderstanding
# This won't work as expected for hyphenated words
\bself-hosted\b
# Hyphen is NOT a word character, so there are multiple boundaries:
self|-|hosted
^ ^
# Use: self-hosted (literal) or self.hosted (with . metachar)
Key Takeaways
-
^= start of line - use for log parsing, validation -
$= end of line - use for file extensions, final values -
\b= word boundary - prevents partial matches -
\<and\>= word start/end - BRE/ERE equivalent -
Anchors match positions, not characters - zero-width
-
Combine anchors -
^…$for full-line validation
Next Module
Groups & Capturing - Extracting parts of matches with parentheses.