Lookahead & Lookbehind
Lookaround assertions match a position based on what comes before or after, without including that text in the match. They’re essential for extracting text that’s surrounded by specific patterns.
Understanding Zero-Width
Lookarounds are "zero-width" - they match a position, not characters.
Pattern: foo(?=bar)
Text: foobar
Matches: ^^^
(matches "foo", not "foobar")
The (?=bar) asserts "bar" follows, but doesn't consume it.
Types of Lookaround
| Type | Syntax | Meaning | Example |
|---|---|---|---|
Positive Lookahead |
|
What follows must match |
|
Negative Lookahead |
|
What follows must NOT match |
|
Positive Lookbehind |
|
What precedes must match |
|
Negative Lookbehind |
|
What precedes must NOT match |
|
Positive Lookahead (?=…)
Match only if followed by the pattern.
Pattern: \d+(?= dollars)
Text: I have 100 dollars and 50 euros
Matches: ^^^
(only the "100" before "dollars")
Infrastructure Examples
# Match username before @domain
grep -oP '\w+(?=@example\.com)' emails.txt
# Match port number before /tcp
grep -oP '\d+(?=/tcp)' netstat.txt
# Match filename before .log extension
grep -oP '[^/]+(?=\.log)' paths.txt
Extract Value Before Unit
# Memory in MB
echo "Memory: 1024 MB" | grep -oP '\d+(?= MB)'
# Output: 1024
# Disk space in GB
echo "Disk: 500 GB available" | grep -oP '\d+(?= GB)'
# Output: 500
Negative Lookahead (?!…)
Match only if NOT followed by the pattern.
Pattern: \d+(?! dollars)
Text: I have 100 dollars and 50 euros
Matches: ^^
(matches "50", not followed by "dollars")
Infrastructure Examples
# IPs NOT in the 192.168.x.x range
grep -P '\b\d+\.\d+\.\d+\.\d+\b(?!.*192\.168)' access.log
# Files NOT ending in .bak
ls | grep -P '.+(?<!\.bak)$'
# Ports NOT followed by "CLOSED"
grep -P '\d+(?!/tcp.*CLOSED)' netstat.txt
Match Words NOT Followed By
# "test" not followed by "ing"
grep -P 'test(?!ing)' file.txt
# Matches: test, tested, tester
# Skips: testing
# "log" not followed by "in" or "out"
grep -P 'log(?!(in|out))' file.txt
Positive Lookbehind (?⇐…)
Match only if preceded by the pattern.
Pattern: (?<=\$)\d+
Text: Price: $100 and €50
Matches: ^^^
(matches "100" after $)
Infrastructure Examples
# Extract value after "port="
grep -oP '(?<=port=)\d+' config.txt
# Extract IP after "from "
grep -oP '(?<=from )\d+\.\d+\.\d+\.\d+' sshd.log
# Extract hostname after "Host: "
grep -oP '(?<=Host: )[^\s]+' http_headers.txt
The \K Shortcut (PCRE)
\K resets the match start - like lookbehind but more powerful.
# Equivalent patterns:
grep -oP '(?<=port=)\d+' config.txt
grep -oP 'port=\K\d+' config.txt
# \K advantage: can use variable-length patterns
grep -oP 'port\s*=\s*\K\d+' config.txt
# (lookbehind requires fixed length in most engines)
IMPORTANT: \K is available in PCRE (grep -P), not in ERE or BRE.
Negative Lookbehind (?<!…)
Match only if NOT preceded by the pattern.
Pattern: (?<!\$)\d+
Text: Price: $100 and 50 units
Matches: ^^
(matches "50", not after $)
Infrastructure Examples
# Numbers NOT after "port"
grep -P '(?<!port)\d+' config.txt
# "error" NOT after "no "
grep -P '(?<!no )error' log.txt
# MAC addresses NOT in comments
grep -P '(?<!#.*)([A-F0-9]{2}:){5}[A-F0-9]{2}' config.txt
Combining Lookarounds
Multiple lookarounds can be combined for precise matching.
Extract Between Delimiters
# Extract value between quotes
echo 'name="value"' | grep -oP '(?<=")[^"]+(?=")'
# Output: value
# More robust with \K
echo 'name="value"' | grep -oP '"[^"]*"\K|(?<=")[^"]+(?=")'
Password Validation
# Require: 8+ chars, uppercase, lowercase, digit
Pattern: ^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$
Breakdown:
(?=.*[A-Z]) - must contain uppercase
(?=.*[a-z]) - must contain lowercase
(?=.*\d) - must contain digit
.{8,} - at least 8 characters
import re
password_pattern = r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$'
for pw in ['weak', 'StrongPass1', 'ALLCAPS123', 'Short1']:
if re.match(password_pattern, pw):
print(f"'{pw}' - VALID")
else:
print(f"'{pw}' - INVALID")
# Output:
# 'weak' - INVALID
# 'StrongPass1' - VALID
# 'ALLCAPS123' - INVALID
# 'Short1' - INVALID
Practical Patterns
Extract Log Fields
# Timestamp at start
grep -oP '^\d{4}-\d{2}-\d{2}T[\d:]+' app.log
# IP after "from"
grep -oP '(?<=from )\d+\.\d+\.\d+\.\d+' sshd.log
# Error message after level
grep -oP '(?<=ERROR: ).+' app.log
Parse Configuration
# Value after key (handles spaces around =)
grep -oP 'server\s*=\s*\K\S+' config.ini
# Port numbers in listen directives
grep -oP '(?<=listen )\d+' nginx.conf
# Enabled settings only
grep -oP '^\s*(?!#)\w+\s*=\s*\K.+' config.ini
URL Parsing
# Domain from URL
echo "https://api.example.com:8443/v1" | grep -oP '(?<=://)[^:/]+'
# Output: api.example.com
# Path from URL
echo "https://api.example.com/v1/users" | grep -oP '(?<=[^/])/[^?]+'
# Output: /v1/users
# Query parameters
echo "https://example.com?id=123&name=test" | grep -oP '(?<=\?).+'
# Output: id=123&name=test
JSON Value Extraction (Simple)
# Value for "name" key
echo '{"name": "value", "id": 123}' | grep -oP '(?<="name": ")[^"]+'
# Output: value
# Numeric value for "id"
echo '{"name": "value", "id": 123}' | grep -oP '(?<="id": )\d+'
# Output: 123
For complex JSON, use jq instead of regex.
|
Fixed-Length Requirement
Most regex engines require lookbehind patterns to be fixed-length.
# Valid - fixed length
(?<=abc) # 3 characters
(?<=\d{4}) # 4 digits
# Invalid in many engines
(?<=\d+) # Variable length
(?<=\w*) # Variable length (might be 0)
Workarounds
# Use \K instead (PCRE)
grep -oP '\d+\K\w+' file.txt
# Use multiple fixed-length lookbehinds
grep -oP '(?<=\d)|(?<=\d{2})|(?<=\d{3})pattern' file.txt
# Use capturing group instead
grep -oP '\d+(\w+)' file.txt
Self-Test Exercises
| Try each challenge FIRST. Only expand the answer after you’ve attempted it. |
Setup Test Data
cat << 'EOF' > /tmp/lookaround.txt
Price: $100
Price: €50
Port: 443/tcp
Port: 80/tcp
server=192.168.1.1
port = 8080
timeout=30
Log: 2026-03-15 ERROR Connection refused
Log: 2026-03-15 INFO Server started
from 10.0.0.1 port 22
from 192.168.1.100 port 443
URL: https://api.example.com/v1
URL: http://localhost:8080/health
Password1 (valid)
weakpass (invalid)
ALLCAPS123 (invalid)
testing
test results
tested
{"name": "server-01", "ip": "10.0.0.1"}
EOF
Challenge 1: Extract Dollar Amounts
Goal: Extract just the number after $ (100, not the $ sign)
Answer
grep -oP '(?<=\$)\d+' /tmp/lookaround.txt
(?⇐\$) is positive lookbehind - match position after $, then \d+ captures digits.
Challenge 2: Extract Port Before /tcp
Goal: Extract port numbers (443, 80) that are followed by /tcp
Answer
grep -oP '\d+(?=/tcp)' /tmp/lookaround.txt
(?=/tcp) is positive lookahead - digits must be followed by /tcp.
Challenge 3: Extract Value After server=
Goal: Extract just the IP after "server=" (192.168.1.1)
Answer
# Using lookbehind
grep -oP '(?<=server=)\S+' /tmp/lookaround.txt
# Using \K (more flexible)
grep -oP 'server=\K\S+' /tmp/lookaround.txt
\K resets match start - everything before is not included in output.
Challenge 4: Extract ERROR Messages
Goal: Extract just the message after "ERROR " (Connection refused)
Answer
grep -oP '(?<=ERROR ).+' /tmp/lookaround.txt
(?⇐ERROR ) matches position after "ERROR ", .+ captures the rest.
Challenge 5: Match "test" NOT Followed by "ing"
Goal: Find "test" in "test results" and "tested", but NOT "testing"
Answer
grep -P '\btest(?!ing)' /tmp/lookaround.txt
(?!ing) is negative lookahead - "test" must NOT be followed by "ing".
Challenge 6: Extract Domain from URLs
Goal: Extract domains (api.example.com, localhost)
Answer
grep -oP '(?<=://)[^:/]+' /tmp/lookaround.txt
(?⇐://) matches position after ://, [^:/]+ captures until colon or slash.
Challenge 7: Extract JSON "name" Value
Goal: Extract just "server-01" from the JSON
Answer
grep -oP '(?<="name": ")[^"]+' /tmp/lookaround.txt
(?⇐"name": ") matches position after the key, [^"]+ captures until closing quote.
Challenge 8: Match Euro Amounts (NOT Dollar)
Goal: Extract amounts NOT preceded by $ (the €50)
Answer
# The 50 after €
grep -oP '(?<=€)\d+' /tmp/lookaround.txt
# Any number NOT after $
grep -oP '(?<!\$)\d+' /tmp/lookaround.txt
(?<!\$) is negative lookbehind - number must NOT be preceded by $.
Challenge 9: Extract Port with \K
Goal: Extract port value using \K instead of lookbehind (8080 from "port = 8080")
Answer
grep -oP 'port\s*=\s*\K\d+' /tmp/lookaround.txt
\s* handles optional spaces around =. \K resets match.
Challenge 10: Extract IP After "from"
Goal: Extract IPs that appear after "from " (10.0.0.1, 192.168.1.100)
Answer
grep -oP '(?<=from )\d+\.\d+\.\d+\.\d+' /tmp/lookaround.txt
Lookbehind ensures "from " precedes the IP.
Challenge 11: Match URL Path
Goal: Extract just the path (/v1, /health) from URLs
Answer
grep -oP '(?<=://[^/]+)/\S+' /tmp/lookaround.txt
Or simpler with \K:
grep -oP 'https?://[^/]+\K/\S+' /tmp/lookaround.txt
Challenge 12: JSON IP Value
Goal: Extract just the IP from the JSON "ip" field (10.0.0.1)
Answer
grep -oP '(?<="ip": ")[^"]+' /tmp/lookaround.txt
Same pattern as name - lookbehind for the key, capture until quote.
Common Mistakes
Mistake 1: Including Lookaround in Match
# Wrong expectation
Pattern: foo(?=bar)
Match: foobar # WRONG - only matches "foo"
Remember: Lookarounds don’t consume characters.
Mistake 2: Variable-Length Lookbehind
# This fails in standard PCRE
grep -P '(?<=\d+)text' file.txt # ERROR
# Use \K instead
grep -oP '\d+\Ktext' file.txt # Works
Mistake 3: Wrong Lookaround Type
# Want: IP followed by port
# Wrong: uses lookbehind instead of lookahead
(?<=:\d+)\d+\.\d+\.\d+\.\d+
# Correct: lookahead or no lookaround needed
\d+\.\d+\.\d+\.\d+(?=:\d+)
Availability by Engine
| Feature | grep -P | Python | JavaScript | sed/awk |
|---|---|---|---|---|
Positive Lookahead |
Yes |
Yes |
Yes |
No |
Negative Lookahead |
Yes |
Yes |
Yes |
No |
Positive Lookbehind |
Yes |
Yes |
ES2018+ |
No |
Negative Lookbehind |
Yes |
Yes |
ES2018+ |
No |
|
Yes |
No |
No |
No |
Key Takeaways
-
Lookarounds match positions, not characters - zero-width
-
(?=…)positive lookahead - must be followed by -
(?!…)negative lookahead - must NOT be followed by -
(?⇐…)positive lookbehind - must be preceded by -
(?<!…)negative lookbehind - must NOT be preceded by -
\Kis PCRE’s flexible lookbehind - resets match start -
Lookbehind usually requires fixed length - use
\Kfor variable
Next Module
Regex Flavors - Understanding BRE, ERE, PCRE, and language differences.