Anchors & Boundaries

Anchors match positions in the text, not characters. They assert that the current position meets a condition without consuming any characters. Essential for precise pattern matching.

Line Anchors

Start of Line ^

Matches the position at the start of a line.

Pattern: ^Error
Text:    Error: Connection failed     ← Matches
         The Error was critical       ← Does NOT match

Infrastructure Example:

# Find lines starting with ERROR
grep '^ERROR' /var/log/syslog

# Find configuration comments
grep '^#' /etc/ssh/sshd_config

# Find lines starting with IP
grep -E '^[0-9]{1,3}\.' access.log

End of Line $

Matches the position at the end of a line.

Pattern: failed$
Text:    Authentication failed        ← Matches
         failed authentication        ← Does NOT match

Infrastructure Example:

# Find lines ending with .log
grep '\.log$' filelist.txt

# Find lines ending with port number
grep -E ':[0-9]+$' connections.txt

# Find lines ending with OK
grep 'OK$' status.log

Combining ^ and $

Pattern: ^ERROR$
Matches: Lines containing ONLY "ERROR"

Pattern: ^$
Matches: Empty lines

Pattern: ^.+$
Matches: Non-empty lines

Practical Examples:

# Remove empty lines
grep -v '^$' file.txt

# Find lines with only whitespace
grep -E '^\s*$' file.txt

# Find exact match on line
grep -x 'PATTERN' file.txt  # Equivalent to ^PATTERN$

Word Boundaries

The \b Anchor

Matches the boundary between a word character (\w) and non-word character.

Pattern: \bcat\b
Text:    The cat sat on the caterpillar.
Matches:     ^^^
Does NOT match: caterpillar (no boundary after 'cat')

What constitutes a word boundary: - Start of string (if first char is \w) - End of string (if last char is \w) - Between \w and \W - Between \W and \w

Infrastructure Examples

# Match "log" as whole word
grep -P '\blog\b' /etc/rsyslog.conf
# Matches: log, log.txt
# NOT: logging, catalog, syslog

# Match VLAN as whole word
grep -P '\bVLAN\b' config.txt

# Match specific port number
grep -P '\b443\b' firewall.log
# Matches: 443, :443
# NOT: 4430, 14435

Word Start \< and Word End \>

BRE/ERE equivalent of \b (directional):

# Word starting with "log"
grep '\<log' /etc/rsyslog.conf
# Matches: log, logging, logout
# NOT: catalog, syslog

# Word ending with "log"
grep 'log\>' /etc/rsyslog.conf
# Matches: log, syslog, catalog
# NOT: logging, logout

# Exact word (combine both)
grep '\<log\>' /etc/rsyslog.conf
# Matches only: log

String Anchors (PCRE)

These match the absolute start/end of the entire string, not lines:

Anchor Meaning Notes

\A

Start of string

Ignores multiline mode

\Z

End of string (before final newline)

Ignores multiline mode

\z

Absolute end of string

Including any trailing newline

Difference from ^ and $: - In multiline mode, ^ and $ match at every line - \A and \z always match only string start/end

import re

text = """Line 1
Line 2
Line 3"""

# ^ matches each line start
re.findall(r'^Line', text, re.MULTILINE)
# ['Line', 'Line', 'Line']

# \A matches only string start
re.findall(r'\ALine', text, re.MULTILINE)
# ['Line'] (only first)

Non-Word Boundary \B

Matches where \b would NOT match (inside a word).

Pattern: \Bcat\B
Text:    The cat and the caterpillar scatter
Matches:                              ^^^
         (only 'cat' inside 'scatter')

Use case: Finding substrings within words.

Practical Patterns

Validate Full Line Content

# Line is a valid IPv4 (basic validation)
grep -E '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$' ips.txt

# Line contains only hex characters
grep -E '^[A-Fa-f0-9]+$' hashes.txt

# Line is a valid MAC address
grep -E '^([A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}$' macs.txt

Extract from Fixed Positions

# Get first word of each line
grep -oE '^\S+' file.txt

# Get last word of each line
grep -oE '\S+$' file.txt

# Get timestamp at line start
grep -oE '^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9:]+' log.txt

Log Parsing

# Lines starting with timestamp
grep -E '^[0-9]{4}-[0-9]{2}-[0-9]{2}' application.log

# Lines starting with log level
grep -E '^(DEBUG|INFO|WARN|ERROR|FATAL)' application.log

# Lines ending with error indicators
grep -E '(failed|error|refused|denied)$' application.log

Configuration Validation

# Find uncommented settings (non-comment lines with '=')
grep -E '^[^#].*=' config.ini

# Find active port definitions
grep -E '^port\s*=' config.ini

# Find blank or comment-only lines
grep -E '^(\s*#.*|\s*)$' config.ini

Multiline Mode

In PCRE, multiline mode changes ^ and $ behavior:

import re

text = """Line 1
Line 2
Line 3"""

# Without MULTILINE: ^ matches only start of string
re.findall(r'^Line \d', text)
# ['Line 1']

# With MULTILINE: ^ matches start of each line
re.findall(r'^Line \d', text, re.MULTILINE)
# ['Line 1', 'Line 2', 'Line 3']

In grep: Always line-based, so ^ and $ work per-line by default.

Self-Test Exercises

Try each challenge FIRST. Only expand the answer after you’ve attempted it.

Setup Test Data

cat << 'EOF' > /tmp/anchors.txt
ERROR Connection failed
Warning: disk space low
error: permission denied
VLAN 100
VLAN100
catalog
log
syslog
logging
192.168.1.1
10.50.1.100
#comment
  # indented comment
setting=value
# disabled=old_value
active_setting = true
Port: 443
Port: 8080
auth.log
syslog.log
authentication failed
connection failed
failed connection
EOF

Challenge 1: Lines Starting with ERROR

Goal: Find lines that START with "ERROR" (uppercase)

Answer
grep '^ERROR' /tmp/anchors.txt

^ matches start of line. Only uppercase ERROR at line start matches.


Challenge 2: Lines Ending with "failed"

Goal: Find lines that END with the word "failed"

Answer
grep 'failed$' /tmp/anchors.txt

$ matches end of line. Output: "authentication failed", "connection failed"


Challenge 3: Whole Word "log"

Goal: Match "log" as a complete word (not "catalog", "syslog", "logging")

Answer
# BRE/ERE word boundaries
grep '\<log\>' /tmp/anchors.txt

# PCRE word boundaries
grep -P '\blog\b' /tmp/anchors.txt

# Or use -w flag
grep -w 'log' /tmp/anchors.txt

\b or \<\> ensure word boundaries. Output: only "log"


Challenge 4: Non-Comment Config Lines

Goal: Find configuration lines (contain =) that are NOT comments

Answer
grep -E '^[^#].*=' /tmp/anchors.txt

[] means line starts with anything EXCEPT . Then .*= finds the equals sign.


Challenge 5: Empty Lines

Goal: Find empty lines (lines with nothing on them)

Answer
grep '^$' /tmp/anchors.txt

^$ means start immediately followed by end = empty line.


Challenge 6: Lines That Are ONLY an IP Address

Goal: Find lines where the ENTIRE line is just an IP address

Answer
grep -E '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$' /tmp/anchors.txt

^…​$ anchors the pattern to the entire line. No extra text allowed.


Challenge 7: Files Ending in .log

Goal: Find lines that end with ".log"

Answer
grep '\.log$' /tmp/anchors.txt

\. escapes the dot (literal). $ anchors to line end.


Challenge 8: Lines Starting with Whitespace

Goal: Find lines that begin with whitespace (space or tab)

Answer
grep -E '^[[:space:]]' /tmp/anchors.txt
# Or
grep '^\s' /tmp/anchors.txt  # Some systems

matches any whitespace character.


Challenge 9: Word Starting with "log"

Goal: Match words STARTING with "log" (log, logging) but not ending with it (catalog, syslog)

Answer
grep '\<log' /tmp/anchors.txt

\< is word START boundary only. Matches: log, logging.


Challenge 10: Word Ending with "log"

Goal: Match words ENDING with "log" (log, syslog, catalog) but not starting with it (logging)

Answer
grep 'log\>' /tmp/anchors.txt

\> is word END boundary only. Matches: log, syslog, catalog.


Challenge 11: Comments (# at Line Start)

Goal: Find comment lines (# at start of line, ignoring whitespace)

Answer
# Strict - # must be first character
grep '^#' /tmp/anchors.txt

# With leading whitespace allowed
grep -E '^\s*#' /tmp/anchors.txt

^\s*# means: start, zero or more whitespace, then #.


Challenge 12: VLAN with Word Boundary

Goal: Match "VLAN 100" but not "VLAN100" (space required)

Answer
grep -E '\bVLAN [0-9]+\b' /tmp/anchors.txt
# Or
grep -E 'VLAN [0-9]+' /tmp/anchors.txt

The space in the pattern ensures separation.

Common Mistakes

Mistake 1: Confusing ^ Inside vs Outside []

^[abc]  → Line starting with a, b, or c
[^abc]  → Character that is NOT a, b, or c

These are completely different!

Mistake 2: Forgetting $ Includes Newline

Pattern: test$
Matches "test" at end of line, INCLUDING before the newline character.

Mistake 3: Word Boundary Misunderstanding

# This won't work as expected for hyphenated words
\bself-hosted\b

# Hyphen is NOT a word character, so there are multiple boundaries:
self|-|hosted
   ^   ^
# Use: self-hosted (literal) or self.hosted (with . metachar)

Key Takeaways

  1. ^ = start of line - use for log parsing, validation

  2. $ = end of line - use for file extensions, final values

  3. \b = word boundary - prevents partial matches

  4. \< and \> = word start/end - BRE/ERE equivalent

  5. Anchors match positions, not characters - zero-width

  6. Combine anchors - ^…​$ for full-line validation

Next Module

Groups & Capturing - Extracting parts of matches with parentheses.