Lookahead & Lookbehind

Lookaround assertions match a position based on what comes before or after, without including that text in the match. They’re essential for extracting text that’s surrounded by specific patterns.

Understanding Zero-Width

Lookarounds are "zero-width" - they match a position, not characters.

Pattern: foo(?=bar)
Text:    foobar
Matches: ^^^
         (matches "foo", not "foobar")

The (?=bar) asserts "bar" follows, but doesn't consume it.

Types of Lookaround

Type Syntax Meaning Example

Positive Lookahead

(?=…​)

What follows must match

foo(?=bar) matches "foo" in "foobar"

Negative Lookahead

(?!…​)

What follows must NOT match

foo(?!bar) matches "foo" in "foobaz"

Positive Lookbehind

(?⇐…​)

What precedes must match

(?⇐foo)bar matches "bar" in "foobar"

Negative Lookbehind

(?<!…​)

What precedes must NOT match

(?<!foo)bar matches "bar" in "bazbar"

Positive Lookahead (?=…​)

Match only if followed by the pattern.

Pattern: \d+(?= dollars)
Text:    I have 100 dollars and 50 euros
Matches:         ^^^
         (only the "100" before "dollars")

Infrastructure Examples

# Match username before @domain
grep -oP '\w+(?=@example\.com)' emails.txt

# Match port number before /tcp
grep -oP '\d+(?=/tcp)' netstat.txt

# Match filename before .log extension
grep -oP '[^/]+(?=\.log)' paths.txt

Extract Value Before Unit

# Memory in MB
echo "Memory: 1024 MB" | grep -oP '\d+(?= MB)'
# Output: 1024

# Disk space in GB
echo "Disk: 500 GB available" | grep -oP '\d+(?= GB)'
# Output: 500

Negative Lookahead (?!…​)

Match only if NOT followed by the pattern.

Pattern: \d+(?! dollars)
Text:    I have 100 dollars and 50 euros
Matches:                        ^^
         (matches "50", not followed by "dollars")

Infrastructure Examples

# IPs NOT in the 192.168.x.x range
grep -P '\b\d+\.\d+\.\d+\.\d+\b(?!.*192\.168)' access.log

# Files NOT ending in .bak
ls | grep -P '.+(?<!\.bak)$'

# Ports NOT followed by "CLOSED"
grep -P '\d+(?!/tcp.*CLOSED)' netstat.txt

Match Words NOT Followed By

# "test" not followed by "ing"
grep -P 'test(?!ing)' file.txt
# Matches: test, tested, tester
# Skips: testing

# "log" not followed by "in" or "out"
grep -P 'log(?!(in|out))' file.txt

Positive Lookbehind (?⇐…​)

Match only if preceded by the pattern.

Pattern: (?<=\$)\d+
Text:    Price: $100 and €50
Matches:          ^^^
         (matches "100" after $)

Infrastructure Examples

# Extract value after "port="
grep -oP '(?<=port=)\d+' config.txt

# Extract IP after "from "
grep -oP '(?<=from )\d+\.\d+\.\d+\.\d+' sshd.log

# Extract hostname after "Host: "
grep -oP '(?<=Host: )[^\s]+' http_headers.txt

The \K Shortcut (PCRE)

\K resets the match start - like lookbehind but more powerful.

# Equivalent patterns:
grep -oP '(?<=port=)\d+' config.txt
grep -oP 'port=\K\d+' config.txt

# \K advantage: can use variable-length patterns
grep -oP 'port\s*=\s*\K\d+' config.txt
# (lookbehind requires fixed length in most engines)

IMPORTANT: \K is available in PCRE (grep -P), not in ERE or BRE.

Negative Lookbehind (?<!…​)

Match only if NOT preceded by the pattern.

Pattern: (?<!\$)\d+
Text:    Price: $100 and 50 units
Matches:                 ^^
         (matches "50", not after $)

Infrastructure Examples

# Numbers NOT after "port"
grep -P '(?<!port)\d+' config.txt

# "error" NOT after "no "
grep -P '(?<!no )error' log.txt

# MAC addresses NOT in comments
grep -P '(?<!#.*)([A-F0-9]{2}:){5}[A-F0-9]{2}' config.txt

Combining Lookarounds

Multiple lookarounds can be combined for precise matching.

Extract Between Delimiters

# Extract value between quotes
echo 'name="value"' | grep -oP '(?<=")[^"]+(?=")'
# Output: value

# More robust with \K
echo 'name="value"' | grep -oP '"[^"]*"\K|(?<=")[^"]+(?=")'

Password Validation

# Require: 8+ chars, uppercase, lowercase, digit
Pattern: ^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$

Breakdown:
(?=.*[A-Z])  - must contain uppercase
(?=.*[a-z])  - must contain lowercase
(?=.*\d)     - must contain digit
.{8,}        - at least 8 characters
import re

password_pattern = r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$'

for pw in ['weak', 'StrongPass1', 'ALLCAPS123', 'Short1']:
    if re.match(password_pattern, pw):
        print(f"'{pw}' - VALID")
    else:
        print(f"'{pw}' - INVALID")

# Output:
# 'weak' - INVALID
# 'StrongPass1' - VALID
# 'ALLCAPS123' - INVALID
# 'Short1' - INVALID

Practical Patterns

Extract Log Fields

# Timestamp at start
grep -oP '^\d{4}-\d{2}-\d{2}T[\d:]+' app.log

# IP after "from"
grep -oP '(?<=from )\d+\.\d+\.\d+\.\d+' sshd.log

# Error message after level
grep -oP '(?<=ERROR: ).+' app.log

Parse Configuration

# Value after key (handles spaces around =)
grep -oP 'server\s*=\s*\K\S+' config.ini

# Port numbers in listen directives
grep -oP '(?<=listen )\d+' nginx.conf

# Enabled settings only
grep -oP '^\s*(?!#)\w+\s*=\s*\K.+' config.ini

URL Parsing

# Domain from URL
echo "https://api.example.com:8443/v1" | grep -oP '(?<=://)[^:/]+'
# Output: api.example.com

# Path from URL
echo "https://api.example.com/v1/users" | grep -oP '(?<=[^/])/[^?]+'
# Output: /v1/users

# Query parameters
echo "https://example.com?id=123&name=test" | grep -oP '(?<=\?).+'
# Output: id=123&name=test

JSON Value Extraction (Simple)

# Value for "name" key
echo '{"name": "value", "id": 123}' | grep -oP '(?<="name": ")[^"]+'
# Output: value

# Numeric value for "id"
echo '{"name": "value", "id": 123}' | grep -oP '(?<="id": )\d+'
# Output: 123
For complex JSON, use jq instead of regex.

Fixed-Length Requirement

Most regex engines require lookbehind patterns to be fixed-length.

# Valid - fixed length
(?<=abc)    # 3 characters
(?<=\d{4})  # 4 digits

# Invalid in many engines
(?<=\d+)    # Variable length
(?<=\w*)    # Variable length (might be 0)

Workarounds

# Use \K instead (PCRE)
grep -oP '\d+\K\w+' file.txt

# Use multiple fixed-length lookbehinds
grep -oP '(?<=\d)|(?<=\d{2})|(?<=\d{3})pattern' file.txt

# Use capturing group instead
grep -oP '\d+(\w+)' file.txt

Self-Test Exercises

Try each challenge FIRST. Only expand the answer after you’ve attempted it.

Setup Test Data

cat << 'EOF' > /tmp/lookaround.txt
Price: $100
Price: €50
Port: 443/tcp
Port: 80/tcp
server=192.168.1.1
port = 8080
timeout=30
Log: 2026-03-15 ERROR Connection refused
Log: 2026-03-15 INFO Server started
from 10.0.0.1 port 22
from 192.168.1.100 port 443
URL: https://api.example.com/v1
URL: http://localhost:8080/health
Password1 (valid)
weakpass (invalid)
ALLCAPS123 (invalid)
testing
test results
tested
{"name": "server-01", "ip": "10.0.0.1"}
EOF

Challenge 1: Extract Dollar Amounts

Goal: Extract just the number after $ (100, not the $ sign)

Answer
grep -oP '(?<=\$)\d+' /tmp/lookaround.txt

(?⇐\$) is positive lookbehind - match position after $, then \d+ captures digits.


Challenge 2: Extract Port Before /tcp

Goal: Extract port numbers (443, 80) that are followed by /tcp

Answer
grep -oP '\d+(?=/tcp)' /tmp/lookaround.txt

(?=/tcp) is positive lookahead - digits must be followed by /tcp.


Challenge 3: Extract Value After server=

Goal: Extract just the IP after "server=" (192.168.1.1)

Answer
# Using lookbehind
grep -oP '(?<=server=)\S+' /tmp/lookaround.txt

# Using \K (more flexible)
grep -oP 'server=\K\S+' /tmp/lookaround.txt

\K resets match start - everything before is not included in output.


Challenge 4: Extract ERROR Messages

Goal: Extract just the message after "ERROR " (Connection refused)

Answer
grep -oP '(?<=ERROR ).+' /tmp/lookaround.txt

(?⇐ERROR ) matches position after "ERROR ", .+ captures the rest.


Challenge 5: Match "test" NOT Followed by "ing"

Goal: Find "test" in "test results" and "tested", but NOT "testing"

Answer
grep -P '\btest(?!ing)' /tmp/lookaround.txt

(?!ing) is negative lookahead - "test" must NOT be followed by "ing".


Challenge 6: Extract Domain from URLs

Goal: Extract domains (api.example.com, localhost)

Answer
grep -oP '(?<=://)[^:/]+' /tmp/lookaround.txt

(?⇐://) matches position after ://, [^:/]+ captures until colon or slash.


Challenge 7: Extract JSON "name" Value

Goal: Extract just "server-01" from the JSON

Answer
grep -oP '(?<="name": ")[^"]+' /tmp/lookaround.txt

(?⇐"name": ") matches position after the key, [^"]+ captures until closing quote.


Challenge 8: Match Euro Amounts (NOT Dollar)

Goal: Extract amounts NOT preceded by $ (the €50)

Answer
# The 50 after €
grep -oP '(?<=€)\d+' /tmp/lookaround.txt

# Any number NOT after $
grep -oP '(?<!\$)\d+' /tmp/lookaround.txt

(?<!\$) is negative lookbehind - number must NOT be preceded by $.


Challenge 9: Extract Port with \K

Goal: Extract port value using \K instead of lookbehind (8080 from "port = 8080")

Answer
grep -oP 'port\s*=\s*\K\d+' /tmp/lookaround.txt

\s* handles optional spaces around =. \K resets match.


Challenge 10: Extract IP After "from"

Goal: Extract IPs that appear after "from " (10.0.0.1, 192.168.1.100)

Answer
grep -oP '(?<=from )\d+\.\d+\.\d+\.\d+' /tmp/lookaround.txt

Lookbehind ensures "from " precedes the IP.


Challenge 11: Match URL Path

Goal: Extract just the path (/v1, /health) from URLs

Answer
grep -oP '(?<=://[^/]+)/\S+' /tmp/lookaround.txt

Or simpler with \K:

grep -oP 'https?://[^/]+\K/\S+' /tmp/lookaround.txt

Challenge 12: JSON IP Value

Goal: Extract just the IP from the JSON "ip" field (10.0.0.1)

Answer
grep -oP '(?<="ip": ")[^"]+' /tmp/lookaround.txt

Same pattern as name - lookbehind for the key, capture until quote.

Common Mistakes

Mistake 1: Including Lookaround in Match

# Wrong expectation
Pattern: foo(?=bar)
Match: foobar  # WRONG - only matches "foo"

Remember: Lookarounds don’t consume characters.

Mistake 2: Variable-Length Lookbehind

# This fails in standard PCRE
grep -P '(?<=\d+)text' file.txt  # ERROR

# Use \K instead
grep -oP '\d+\Ktext' file.txt    # Works

Mistake 3: Wrong Lookaround Type

# Want: IP followed by port
# Wrong: uses lookbehind instead of lookahead
(?<=:\d+)\d+\.\d+\.\d+\.\d+

# Correct: lookahead or no lookaround needed
\d+\.\d+\.\d+\.\d+(?=:\d+)

Availability by Engine

Feature grep -P Python JavaScript sed/awk

Positive Lookahead (?=)

Yes

Yes

Yes

No

Negative Lookahead (?!)

Yes

Yes

Yes

No

Positive Lookbehind (?⇐)

Yes

Yes

ES2018+

No

Negative Lookbehind (?<!)

Yes

Yes

ES2018+

No

\K (reset match start)

Yes

No

No

No

Key Takeaways

  1. Lookarounds match positions, not characters - zero-width

  2. (?=…​) positive lookahead - must be followed by

  3. (?!…​) negative lookahead - must NOT be followed by

  4. (?⇐…​) positive lookbehind - must be preceded by

  5. (?<!…​) negative lookbehind - must NOT be preceded by

  6. \K is PCRE’s flexible lookbehind - resets match start

  7. Lookbehind usually requires fixed length - use \K for variable

Next Module

Regex Flavors - Understanding BRE, ERE, PCRE, and language differences.