Regex Session 04: Lookahead & Lookbehind
Lookaround lets you match based on context without including that context in your match. This is how you extract "just the data" from structured text.
Lookaround requires PCRE - use grep -P or rg, not grep -E.
|
The Problem Lookaround Solves
Goal: Extract only the IP address from "Server: 192.168.1.100"
Without lookaround:
# This includes "Server: " in the match
echo "Server: 192.168.1.100" | grep -oE 'Server: [0-9.]+'
# Output: Server: 192.168.1.100
With lookbehind:
# Match IP only IF preceded by "Server: "
echo "Server: 192.168.1.100" | grep -oP '(?<=Server: )[0-9.]+'
# Output: 192.168.1.100
Key insight: The lookbehind (?⇐Server: ) asserts context but doesn’t consume it.
Test File Setup
cat << 'EOF' > /tmp/lookaround-practice.txt
# Configuration values
username=admin
password=secret123
api_key=sk_live_abc123xyz789
database_host=db.example.com
database_port=5432
# Log entries
2026-03-15 10:30:45 User: evanusmodestus logged in
2026-03-15 10:31:02 Error: Connection refused
2026-03-15 10:32:00 Warning: Disk space low
# Data with context
Price: $99.99 (discounted)
Original: $149.99
Tax: $12.50
Total: $112.49
# Network info
IP Address: 192.168.1.100
MAC Address: AA:BB:CC:DD:EE:FF
Gateway: 10.50.1.1
DNS Server: 10.50.1.90, 10.50.1.91
# Files with extensions
report.pdf
document.docx
image.png
script.sh
config.yaml
EOF
The Four Lookaround Types
| Type | Syntax | Meaning | Example |
|---|---|---|---|
Positive Lookahead |
|
Followed by X |
|
Negative Lookahead |
|
NOT followed by X |
|
Positive Lookbehind |
|
Preceded by X |
|
Negative Lookbehind |
|
NOT preceded by X |
|
Lesson 1: Positive Lookbehind (?⇐…)
Use case: Extract value AFTER a known prefix.
# Extract username value (after "username=")
grep -oP '(?<=username=)\w+' /tmp/lookaround-practice.txt
# Output: admin
# Extract all values after "="
grep -oP '(?<==)\S+' /tmp/lookaround-practice.txt
# Output: admin, secret123, sk_live_abc123xyz789, etc.
# Extract IP after "IP Address: "
grep -oP '(?<=IP Address: )[0-9.]+' /tmp/lookaround-practice.txt
# Output: 192.168.1.100
Exercise 1.1: Extract prices (number after $)
grep -oP '(?<=\$)[0-9.]+' /tmp/lookaround-practice.txt
Output:
99.99 149.99 12.50 112.49
Exercise 1.2: Extract username from log
grep -oP '(?<=User: )\w+' /tmp/lookaround-practice.txt
Output: evanusmodestus
Lesson 2: Positive Lookahead (?=…)
Use case: Match something that’s FOLLOWED BY specific text.
# Match numbers that are followed by ".pdf"
grep -oP '\w+(?=\.pdf)' /tmp/lookaround-practice.txt
# Output: report
# Match words before ": " (key names)
grep -oP '\w+(?=:)' /tmp/lookaround-practice.txt
# Output: username, password, api_key, etc.
Exercise 2.1: Find config keys (before "=")
grep -oP '^\w+(?==)' /tmp/lookaround-practice.txt
Output:
username password api_key database_host database_port
Exercise 2.2: Find file names without extension
grep -oP '^\w+(?=\.\w+$)' /tmp/lookaround-practice.txt
Output:
report document image script config
Lesson 3: Negative Lookbehind (?<!…)
Use case: Match something NOT preceded by specific text.
# Match ".99" NOT preceded by "$" (find non-price decimals)
echo -e "Price: \$99.99\nVersion: 1.99\nDiscount: \$0.99" | \
grep -oP '(?<!\$)[0-9]+\.[0-9]+'
# Output: 1.99
# Match "admin" NOT preceded by "user:"
echo -e "user:admin\nsudo admin\nadmin access" | \
grep -P '(?<!user:)admin'
# Output: sudo admin, admin access
Exercise 3.1: Find IP that’s NOT in Gateway
# Match IPs NOT after "Gateway: "
grep -P '(?<!Gateway: )[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /tmp/lookaround-practice.txt
Lesson 4: Negative Lookahead (?!…)
Use case: Match something NOT followed by specific text.
# Match files NOT ending in .sh or .yaml (non-code files)
grep -oP '\w+\.(?!sh|yaml)\w+' /tmp/lookaround-practice.txt
# Output: report.pdf, document.docx, image.png
Exercise 4.1: Match words NOT followed by colon
# Find standalone words (not config keys)
grep -oP '\b\w{4,}(?!:)\b' /tmp/lookaround-practice.txt | head -10
Lesson 5: Combining Lookarounds
Power move: Use lookbehind AND lookahead together.
# Extract port number: after ":" but before end of line
grep -oP '(?<=:)[0-9]+(?=$)' /tmp/lookaround-practice.txt
# Output: 5432
# Extract value between "=" and end of line
grep -oP '(?<==)[^=\n]+(?=$)' /tmp/lookaround-practice.txt
Exercise 5.1: Extract MAC address components
# Extract just the hex values (between colons or at boundaries)
grep -oP '(?<=: |:)[A-F0-9]{2}(?=:|$)' /tmp/lookaround-practice.txt
Practical Applications
Password/Secret Detection (Security Audit)
# Find values after sensitive keys (password, secret, key, token)
grep -oP '(?<=(password|secret|key|token)=)\S+' /tmp/lookaround-practice.txt
Output:
secret123 sk_live_abc123xyz789
Log Field Extraction
# Create structured log data
cat << 'EOF' > /tmp/logs.txt
timestamp=2026-03-15T10:30:45 level=INFO msg="Server started"
timestamp=2026-03-15T10:31:00 level=ERROR msg="Connection failed"
timestamp=2026-03-15T10:32:00 level=WARN msg="Low disk space"
EOF
# Extract just the level field
grep -oP '(?<=level=)\w+' /tmp/logs.txt
# Extract just the message content
grep -oP '(?<=msg=")[^"]+' /tmp/logs.txt
URL Parameter Extraction
# Extract parameter values from URLs
echo "https://api.example.com/users?id=123&name=admin" | \
grep -oP '(?<=\?|&)[^=]+=[^&]+'
# Extract just the id value
echo "https://api.example.com/users?id=123&name=admin" | \
grep -oP '(?<=id=)[0-9]+'
Config File Parsing
# Extract database connection info
grep -oP '(?<=database_)\w+=\K\S+' /tmp/lookaround-practice.txt
\K resets the match start position - another way to "not include" earlier text.
|
The \K Escape (Alternative to Lookbehind)
\K means "keep" - start the match from this position.
# These are equivalent:
grep -oP '(?<=password=)\S+' /tmp/lookaround-practice.txt
grep -oP 'password=\K\S+' /tmp/lookaround-practice.txt
# Both output: secret123
Advantage of \K: Works with variable-length patterns (lookbehind in PCRE requires fixed length).
# Variable-length lookbehind (would fail)
# grep -oP '(?<=password=|api_key=)\S+' file.txt # Might fail
# Using \K (always works)
grep -oP '(password|api_key)=\K\S+' /tmp/lookaround-practice.txt
Summary: Lookaround Cheat Sheet
| Syntax | Meaning | Example Use |
|---|---|---|
|
Y preceded by X |
Extract value after key |
|
Y followed by X |
Match before extension |
|
Y NOT preceded by X |
Exclude specific contexts |
|
Y NOT followed by X |
Exclude certain patterns |
|
Start match at Y |
Variable-length "lookbehind" |
Exercises to Complete
-
[ ] Extract all values from key=value pairs
-
[ ] Find prices between $10 and $100 (dollar amounts with lookaround)
-
[ ] Extract the extension from filenames
-
[ ] Find IP addresses NOT in the Gateway line
-
[ ] Extract hostnames from URLs (between :// and /)
Self-Check
Solutions
# 1. Extract all values
grep -oP '(?<==)[^\s]+' /tmp/lookaround-practice.txt
# 2. Prices $10-$100 (2-digit numbers after $)
grep -oP '(?<=\$)[0-9]{2}\.[0-9]{2}' /tmp/lookaround-practice.txt
# 3. Extract extensions
grep -oP '(?<=\.)\w+$' /tmp/lookaround-practice.txt
# 4. IPs NOT in Gateway
grep -v 'Gateway' /tmp/lookaround-practice.txt | grep -oP '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
# 5. Hostnames from URLs
grep -oP '(?<=://)[^/:]+' /tmp/lookaround-practice.txt
Post-Session Reflection
What clicked:
-
<Write what made sense>
What’s still fuzzy:
-
<Write what needs more practice>
Connection to work:
-
<How will you use this?>
Next Session
Session 05: sed Mastery - Transform text with regex substitutions.