Regex Session 04: Lookahead & Lookbehind

Lookaround lets you match based on context without including that context in your match. This is how you extract "just the data" from structured text.

Lookaround requires PCRE - use grep -P or rg, not grep -E.

The Problem Lookaround Solves

Goal: Extract only the IP address from "Server: 192.168.1.100"

Without lookaround:

# This includes "Server: " in the match
echo "Server: 192.168.1.100" | grep -oE 'Server: [0-9.]+'
# Output: Server: 192.168.1.100

With lookbehind:

# Match IP only IF preceded by "Server: "
echo "Server: 192.168.1.100" | grep -oP '(?<=Server: )[0-9.]+'
# Output: 192.168.1.100

Key insight: The lookbehind (?⇐Server: ) asserts context but doesn’t consume it.

Test File Setup

cat << 'EOF' > /tmp/lookaround-practice.txt
# Configuration values
username=admin
password=secret123
api_key=sk_live_abc123xyz789
database_host=db.example.com
database_port=5432

# Log entries
2026-03-15 10:30:45 User: evanusmodestus logged in
2026-03-15 10:31:02 Error: Connection refused
2026-03-15 10:32:00 Warning: Disk space low

# Data with context
Price: $99.99 (discounted)
Original: $149.99
Tax: $12.50
Total: $112.49

# Network info
IP Address: 192.168.1.100
MAC Address: AA:BB:CC:DD:EE:FF
Gateway: 10.50.1.1
DNS Server: 10.50.1.90, 10.50.1.91

# Files with extensions
report.pdf
document.docx
image.png
script.sh
config.yaml
EOF

The Four Lookaround Types

Type Syntax Meaning Example

Positive Lookahead

(?=…​)

Followed by X

foo(?=bar) matches "foo" in "foobar"

Negative Lookahead

(?!…​)

NOT followed by X

foo(?!bar) matches "foo" in "foobaz"

Positive Lookbehind

(?⇐…​)

Preceded by X

(?⇐foo)bar matches "bar" in "foobar"

Negative Lookbehind

(?<!…​)

NOT preceded by X

(?<!foo)bar matches "bar" in "xyzbar"

Lesson 1: Positive Lookbehind (?⇐…​)

Use case: Extract value AFTER a known prefix.

# Extract username value (after "username=")
grep -oP '(?<=username=)\w+' /tmp/lookaround-practice.txt
# Output: admin

# Extract all values after "="
grep -oP '(?<==)\S+' /tmp/lookaround-practice.txt
# Output: admin, secret123, sk_live_abc123xyz789, etc.

# Extract IP after "IP Address: "
grep -oP '(?<=IP Address: )[0-9.]+' /tmp/lookaround-practice.txt
# Output: 192.168.1.100

Exercise 1.1: Extract prices (number after $)

grep -oP '(?<=\$)[0-9.]+' /tmp/lookaround-practice.txt

Output:

99.99
149.99
12.50
112.49

Exercise 1.2: Extract username from log

grep -oP '(?<=User: )\w+' /tmp/lookaround-practice.txt

Output: evanusmodestus

Lesson 2: Positive Lookahead (?=…​)

Use case: Match something that’s FOLLOWED BY specific text.

# Match numbers that are followed by ".pdf"
grep -oP '\w+(?=\.pdf)' /tmp/lookaround-practice.txt
# Output: report

# Match words before ": " (key names)
grep -oP '\w+(?=:)' /tmp/lookaround-practice.txt
# Output: username, password, api_key, etc.

Exercise 2.1: Find config keys (before "=")

grep -oP '^\w+(?==)' /tmp/lookaround-practice.txt

Output:

username
password
api_key
database_host
database_port

Exercise 2.2: Find file names without extension

grep -oP '^\w+(?=\.\w+$)' /tmp/lookaround-practice.txt

Output:

report
document
image
script
config

Lesson 3: Negative Lookbehind (?<!…​)

Use case: Match something NOT preceded by specific text.

# Match ".99" NOT preceded by "$" (find non-price decimals)
echo -e "Price: \$99.99\nVersion: 1.99\nDiscount: \$0.99" | \
  grep -oP '(?<!\$)[0-9]+\.[0-9]+'
# Output: 1.99

# Match "admin" NOT preceded by "user:"
echo -e "user:admin\nsudo admin\nadmin access" | \
  grep -P '(?<!user:)admin'
# Output: sudo admin, admin access

Exercise 3.1: Find IP that’s NOT in Gateway

# Match IPs NOT after "Gateway: "
grep -P '(?<!Gateway: )[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/lookaround-practice.txt

Lesson 4: Negative Lookahead (?!…​)

Use case: Match something NOT followed by specific text.

# Match files NOT ending in .sh or .yaml (non-code files)
grep -oP '\w+\.(?!sh|yaml)\w+' /tmp/lookaround-practice.txt
# Output: report.pdf, document.docx, image.png

Exercise 4.1: Match words NOT followed by colon

# Find standalone words (not config keys)
grep -oP '\b\w{4,}(?!:)\b' /tmp/lookaround-practice.txt | head -10

Lesson 5: Combining Lookarounds

Power move: Use lookbehind AND lookahead together.

# Extract port number: after ":" but before end of line
grep -oP '(?<=:)[0-9]+(?=$)' /tmp/lookaround-practice.txt
# Output: 5432

# Extract value between "=" and end of line
grep -oP '(?<==)[^=\n]+(?=$)' /tmp/lookaround-practice.txt

Exercise 5.1: Extract MAC address components

# Extract just the hex values (between colons or at boundaries)
grep -oP '(?<=: |:)[A-F0-9]{2}(?=:|$)' /tmp/lookaround-practice.txt

Practical Applications

Password/Secret Detection (Security Audit)

# Find values after sensitive keys (password, secret, key, token)
grep -oP '(?<=(password|secret|key|token)=)\S+' /tmp/lookaround-practice.txt

Output:

secret123
sk_live_abc123xyz789

Log Field Extraction

# Create structured log data
cat << 'EOF' > /tmp/logs.txt
timestamp=2026-03-15T10:30:45 level=INFO msg="Server started"
timestamp=2026-03-15T10:31:00 level=ERROR msg="Connection failed"
timestamp=2026-03-15T10:32:00 level=WARN msg="Low disk space"
EOF

# Extract just the level field
grep -oP '(?<=level=)\w+' /tmp/logs.txt

# Extract just the message content
grep -oP '(?<=msg=")[^"]+' /tmp/logs.txt

URL Parameter Extraction

# Extract parameter values from URLs
echo "https://api.example.com/users?id=123&name=admin" | \
  grep -oP '(?<=\?|&)[^=]+=[^&]+'

# Extract just the id value
echo "https://api.example.com/users?id=123&name=admin" | \
  grep -oP '(?<=id=)[0-9]+'

Config File Parsing

# Extract database connection info
grep -oP '(?<=database_)\w+=\K\S+' /tmp/lookaround-practice.txt
\K resets the match start position - another way to "not include" earlier text.

The \K Escape (Alternative to Lookbehind)

\K means "keep" - start the match from this position.

# These are equivalent:
grep -oP '(?<=password=)\S+' /tmp/lookaround-practice.txt
grep -oP 'password=\K\S+' /tmp/lookaround-practice.txt
# Both output: secret123

Advantage of \K: Works with variable-length patterns (lookbehind in PCRE requires fixed length).

# Variable-length lookbehind (would fail)
# grep -oP '(?<=password=|api_key=)\S+' file.txt  # Might fail

# Using \K (always works)
grep -oP '(password|api_key)=\K\S+' /tmp/lookaround-practice.txt

Summary: Lookaround Cheat Sheet

Syntax Meaning Example Use

(?⇐X)Y

Y preceded by X

Extract value after key

Y(?=X)

Y followed by X

Match before extension

(?<!X)Y

Y NOT preceded by X

Exclude specific contexts

Y(?!X)

Y NOT followed by X

Exclude certain patterns

X\KY

Start match at Y

Variable-length "lookbehind"

Exercises to Complete

  1. [ ] Extract all values from key=value pairs

  2. [ ] Find prices between $10 and $100 (dollar amounts with lookaround)

  3. [ ] Extract the extension from filenames

  4. [ ] Find IP addresses NOT in the Gateway line

  5. [ ] Extract hostnames from URLs (between :// and /)

Self-Check

Solutions
# 1. Extract all values
grep -oP '(?<==)[^\s]+' /tmp/lookaround-practice.txt

# 2. Prices $10-$100 (2-digit numbers after $)
grep -oP '(?<=\$)[0-9]{2}\.[0-9]{2}' /tmp/lookaround-practice.txt

# 3. Extract extensions
grep -oP '(?<=\.)\w+$' /tmp/lookaround-practice.txt

# 4. IPs NOT in Gateway
grep -v 'Gateway' /tmp/lookaround-practice.txt | grep -oP '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'

# 5. Hostnames from URLs
grep -oP '(?<=://)[^/:]+' /tmp/lookaround-practice.txt

Post-Session Reflection

What clicked:

  • <Write what made sense>

What’s still fuzzy:

  • <Write what needs more practice>

Connection to work:

  • <How will you use this?>

Next Session

Session 05: sed Mastery - Transform text with regex substitutions.