Infrastructure Patterns
A curated library of battle-tested regex patterns for infrastructure engineering. Each pattern includes variations for different strictness levels and tool compatibility.
Network Patterns
IPv4 Address
Basic (Structure Only):
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}
Strict (Valid Range 0-255):
(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}
With Word Boundaries:
# PCRE
grep -P '\b(?:\d{1,3}\.){3}\d{1,3}\b' file.txt
# ERE
grep -E '\b([0-9]{1,3}\.){3}[0-9]{1,3}\b' file.txt
IPv6 Address
Full Format:
([0-9A-Fa-f]{1,4}:){7}[0-9A-Fa-f]{1,4}
With Compression (::):
(([0-9A-Fa-f]{1,4}:){1,7}:|([0-9A-Fa-f]{1,4}:){1,6}:[0-9A-Fa-f]{1,4}|([0-9A-Fa-f]{1,4}:){1,5}(:[0-9A-Fa-f]{1,4}){1,2}|([0-9A-Fa-f]{1,4}:){1,4}(:[0-9A-Fa-f]{1,4}){1,3}|([0-9A-Fa-f]{1,4}:){1,3}(:[0-9A-Fa-f]{1,4}){1,4}|([0-9A-Fa-f]{1,4}:){1,2}(:[0-9A-Fa-f]{1,4}){1,5}|[0-9A-Fa-f]{1,4}:((:[0-9A-Fa-f]{1,4}){1,6})|:((:[0-9A-Fa-f]{1,4}){1,7}|:))
| IPv6 validation is complex. For production, consider using a library. |
Private IP Ranges
# 10.0.0.0/8
grep -E '^10\.' file.txt
# 172.16.0.0/12
grep -E '^172\.(1[6-9]|2[0-9]|3[0-1])\.' file.txt
# 192.168.0.0/16
grep -E '^192\.168\.' file.txt
# Combined
grep -E '^(10\.|172\.(1[6-9]|2[0-9]|3[0-1])\.|192\.168\.)' file.txt
CIDR Notation
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/\d{1,2}
grep -oP '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/\d{1,2}' routes.txt
MAC Address
Colon-Separated (Unix):
([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}
Hyphen-Separated (Windows):
([0-9A-Fa-f]{2}-){5}[0-9A-Fa-f]{2}
Cisco Format:
[0-9A-Fa-f]{4}\.[0-9A-Fa-f]{4}\.[0-9A-Fa-f]{4}
Any Format:
grep -oP '([0-9A-Fa-f]{2}[:-]){5}[0-9A-Fa-f]{2}|[0-9A-Fa-f]{4}\.[0-9A-Fa-f]{4}\.[0-9A-Fa-f]{4}' file.txt
Port Numbers
Any Port (1-65535):
(6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{0,3})
Well-Known Ports (1-1023):
(102[0-3]|10[0-1][0-9]|[1-9][0-9]{0,2})
Simple (No Validation):
grep -oP ':\K\d{1,5}(?=\s|$)' netstat.txt
VLAN ID (1-4094)
(409[0-4]|40[0-8][0-9]|[1-3][0-9]{3}|[1-9][0-9]{0,2})
Security Patterns
Common Secrets
AWS Access Key:
AKIA[0-9A-Z]{16}
AWS Secret Key:
[A-Za-z0-9/+=]{40}
GitHub Token:
(ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9]{36}
Generic API Key:
[Aa][Pp][Ii][-_]?[Kk][Ee][Yy]\s*[:=]\s*['"]?[A-Za-z0-9_-]{20,}['"]?
Password in URL:
://[^:]+:[^@]+@
Scan for Secrets
# AWS keys
grep -rP 'AKIA[0-9A-Z]{16}' .
# Private keys
grep -rl 'BEGIN.*PRIVATE KEY' .
# Generic secrets
grep -rP '(password|passwd|pwd|secret|token|api[_-]?key)\s*[:=]\s*\S+' .
# Base64 encoded (potential secrets)
grep -oP '[A-Za-z0-9+/]{40,}={0,2}' config.txt
Hash Patterns
MD5 (32 hex):
[a-fA-F0-9]{32}
SHA-1 (40 hex):
[a-fA-F0-9]{40}
SHA-256 (64 hex):
[a-fA-F0-9]{64}
SHA-512 (128 hex):
[a-fA-F0-9]{128}
Any Hash (Standalone):
grep -oP '\b[a-f0-9]{32,128}\b' hashes.txt
Log Parsing Patterns
Syslog Format
BSD Format:
^(\w{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})\s+(\S+)\s+(\S+)\[(\d+)\]:\s+(.*)$
Groups: timestamp, hostname, program, pid, message
# Extract timestamp
grep -oP '^\w{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}' /var/log/syslog
# Extract hostname
grep -oP '^\w{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}\s+\K\S+' /var/log/syslog
ISO 8601 Timestamp
Date Only:
\d{4}-\d{2}-\d{2}
Date and Time:
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}
With Timezone:
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}([+-]\d{2}:\d{2}|Z)
With Milliseconds:
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}([+-]\d{2}:\d{2}|Z)?
Apache/Nginx Access Log
Combined Format:
^(\S+)\s+\S+\s+\S+\s+\[([^\]]+)\]\s+"(\S+)\s+(\S+)\s+\S+"\s+(\d+)\s+(\d+)\s+"([^"]*)"\s+"([^"]*)"
Groups: IP, timestamp, method, path, status, size, referer, user-agent
# Extract IPs
awk '{print $1}' access.log
# Extract status codes
grep -oP '"\s+\K\d{3}(?=\s)' access.log
# Find 5xx errors
grep -P '"\s+5\d{2}\s' access.log
JSON Log Lines
# Extract level field
grep -oP '"level"\s*:\s*"\K[^"]+' jsonlog.txt
# Extract message field
grep -oP '"message"\s*:\s*"\K[^"]+' jsonlog.txt
# Extract timestamp
grep -oP '"timestamp"\s*:\s*"\K[^"]+' jsonlog.txt
For complex JSON, use jq instead of regex.
|
SSH Auth Log
Failed Password:
Failed password for (?:invalid user )?(\S+) from (\d+\.\d+\.\d+\.\d+) port (\d+)
Accepted:
Accepted (password|publickey) for (\S+) from (\d+\.\d+\.\d+\.\d+) port (\d+)
# Find failed logins
grep -P 'Failed password' /var/log/auth.log
# Extract attacking IPs
grep -oP 'Failed password.*from \K\d+\.\d+\.\d+\.\d+' /var/log/auth.log | sort | uniq -c | sort -rn
Configuration Patterns
INI File
Key-Value Pairs:
^(\w+)\s*=\s*(.+)$
Section Headers:
^\[([^\]]+)\]$
Comments:
^[#;].*$
# Extract all settings (no comments)
grep -P '^[^#;]\w+\s*=' config.ini
# Extract specific value
grep -oP '^port\s*=\s*\K\d+' config.ini
YAML Key-Value
# Top-level keys
grep -oP '^\w+(?=:)' config.yaml
# Specific value
grep -oP '^server:\s*\K.+' config.yaml
# Find all port definitions
grep -oP 'port:\s*\K\d+' config.yaml
Environment Variables
Definition:
^([A-Z_][A-Z0-9_]*)\s*=\s*(.*)$
Expansion:
\$\{?([A-Z_][A-Z0-9_]*)\}?
# Find env var usage
grep -oP '\$\{?[A-Z_][A-Z0-9_]*\}?' script.sh
# Find exported variables
grep -oP '^export\s+\K[A-Z_][A-Z0-9_]*' .bashrc
URL and Email Patterns
URL
HTTP/HTTPS:
https?://[^\s<>"{}|\\^`\[\]]+
With Named Groups:
(?P<protocol>https?):\/\/(?P<host>[^\/:]+)(?::(?P<port>\d+))?(?P<path>\/[^?#]*)?(?:\?(?P<query>[^#]*))?(?:#(?P<fragment>.*))?
# Extract URLs
grep -oP 'https?://\S+' document.txt
# Extract domains
grep -oP 'https?://\K[^/:]+' document.txt
Basic:
[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}
RFC 5322 (More Strict):
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
| Basic email regex covers 99% of real-world addresses. |
Domain Name
([a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}
Identifiers
UUID
[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}
grep -oiP '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}' log.txt
Kubernetes Pod Names
[a-z0-9]([a-z0-9-]*[a-z0-9])?(-[a-z0-9]{5})?(-[a-z0-9]{5})?
Validation Patterns
Hostname (RFC 1123)
^[a-z0-9]([a-z0-9-]*[a-z0-9])?(\.[a-z0-9]([a-z0-9-]*[a-z0-9])?)*$
Username (Linux)
^[a-z_]([a-z0-9_-]{0,31}|[a-z0-9_-]{0,30}\$)$
Semver Version
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
Simplified:
\d+\.\d+\.\d+
Quick Reference Card
| Pattern | Use Case |
|---|---|
|
IPv4 address (basic) |
|
MAC address (colon) |
|
ISO timestamp |
|
MD5 hash |
|
UUID |
|
AWS access key |
|
URL (simple) |
|
Email (basic) |
|
Log level |
|
SSH failed login IP |
Self-Test Exercises
| Try each challenge FIRST. Only expand the answer after you’ve attempted it. |
Setup Test Data
cat << 'EOF' > /tmp/patterns.txt
IP: 192.168.1.100
IP: 10.50.1.20
IP: 255.255.255.255
IP: 999.999.999.999
MAC: AA:BB:CC:DD:EE:FF
MAC: aa:bb:cc:dd:ee:ff
MAC: AA-BB-CC-DD-EE-FF
MAC: aabb.ccdd.eeff
Hash: 5d41402abc4b2a76b9719d911017c592
Hash: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
UUID: 550e8400-e29b-41d4-a716-446655440000
CIDR: 192.168.1.0/24
CIDR: 10.0.0.0/8
Email: admin@example.com
Email: user.name+tag@sub.domain.org
URL: https://api.example.com:8443/v1/users?id=123
URL: http://localhost:8080/health
AWS: AKIAIOSFODNN7EXAMPLE
GitHub: ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Date: 2026-03-15T14:30:00Z
Date: Mar 15 14:30:00
VLAN: 100
VLAN: 4094
Port: 443
Port: 65535
Port: 99999
EOF
cat << 'EOF' > /tmp/api.json
{
"hosts": [
{"name": "web-01", "ip": "192.168.1.10", "status": "running"},
{"name": "web-02", "ip": "192.168.1.11", "status": "stopped"},
{"name": "db-01", "ip": "192.168.1.20", "status": "running", "port": 5432}
],
"network": {
"vlan": 100,
"gateway": "192.168.1.1",
"cidr": "192.168.1.0/24"
},
"errors": [
{"code": 500, "message": "Internal server error"},
{"code": 404, "message": "Not found"}
]
}
EOF
Challenge 1: Valid IPv4 Only
Goal: Match only valid IPs (0-255 per octet), not 999.999.999.999
Answer
grep -oP '\b(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)){3}\b' /tmp/patterns.txt
This validates each octet is 0-255. Complex but accurate.
Challenge 2: All MAC Formats
Goal: Match MACs in any format (colon, hyphen, or Cisco dot notation)
Answer
grep -oiP '([0-9a-f]{2}[:-]){5}[0-9a-f]{2}|[0-9a-f]{4}\.[0-9a-f]{4}\.[0-9a-f]{4}' /tmp/patterns.txt
-i for case insensitive. Alternation for different formats.
Challenge 3: SHA-256 Hash (64 chars)
Goal: Extract only the SHA-256 hash (64 hex chars), not MD5 (32)
Answer
grep -oP '\b[a-f0-9]{64}\b' /tmp/patterns.txt
{64} ensures exactly 64 characters. Word boundary prevents partial match.
Challenge 4: Extract UUID
Goal: Extract the UUID in standard format
Answer
grep -oP '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}' /tmp/patterns.txt
UUID format: 8-4-4-4-12 hex digits.
Challenge 5: Valid Port Only (1-65535)
Goal: Match valid port numbers, not 99999
Answer
grep -oP '(?<=Port: )(6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3})\b' /tmp/patterns.txt
Complex but validates 1-65535 range.
Challenge 6: AWS Access Key Pattern
Goal: Find AWS access key ID
Answer
grep -oP 'AKIA[0-9A-Z]{16}' /tmp/patterns.txt
AWS keys start with AKIA followed by 16 alphanumeric chars.
Challenge 7: GitHub Token Pattern
Goal: Find GitHub personal access token
Answer
grep -oP 'ghp_[A-Za-z0-9]{36}' /tmp/patterns.txt
GitHub PATs start with ghp_ followed by 36 chars.
Challenge 8: ISO 8601 Timestamp
Goal: Extract the ISO 8601 date with timezone
Answer
grep -oP '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?' /tmp/patterns.txt
Format: YYYY-MM-DDTHH:MM:SS with optional Z.
Challenge 9: jq - Extract All IPs
Goal: Use jq to extract all IP addresses from the JSON
Answer
jq -r '.hosts[].ip, .network.gateway' /tmp/api.json
Output: 192.168.1.10, 192.168.1.11, 192.168.1.20, 192.168.1.1
Challenge 10: jq - Filter Running Hosts
Goal: Use jq to get names of hosts with status "running"
Answer
jq -r '.hosts[] | select(.status == "running") | .name' /tmp/api.json
Output: web-01, db-01
Challenge 11: jq - Extract Error Messages
Goal: Get all error messages from the JSON
Answer
jq -r '.errors[].message' /tmp/api.json
Output: Internal server error, Not found
Challenge 12: jq - Build Host Table
Goal: Create a formatted table of hostname and IP
Answer
jq -r '.hosts[] | "\(.name)\t\(.ip)"' /tmp/api.json
Output: tab-separated name and IP for each host.
Challenge 13: jq + grep Combined
Goal: Extract IPs from JSON, then grep for 192.168.x.x range
Answer
jq -r '.. | strings' /tmp/api.json | grep -oE '192\.168\.\d+\.\d+'
.. | strings recursively extracts all string values.
Challenge 14: curl + jq API Pattern
Goal: Parse a real API response pattern (simulated)
Answer
# Simulate API response processing
echo '{"users":[{"id":1,"email":"admin@example.com"},{"id":2,"email":"user@test.org"}]}' | \
jq -r '.users[].email' | \
grep -oP '[^@]+(?=@)'
This extracts usernames (part before @) from email addresses.
Challenge 15: CIDR Notation
Goal: Extract network CIDR blocks
Answer
grep -oP '\d+\.\d+\.\d+\.\d+/\d{1,2}' /tmp/patterns.txt
IP followed by / and 1-2 digit prefix.
Challenge 16: jq - Conditional Field Selection
Goal: Get port only for hosts that have it defined
Answer
jq -r '.hosts[] | select(.port) | "\(.name): \(.port)"' /tmp/api.json
select(.port) filters to hosts where port exists.
Key Takeaways
-
Start simple, add strictness as needed - basic patterns cover most cases
-
Use word boundaries - prevent partial matches
-
Named groups improve readability - especially for complex patterns
-
Test with real data - edge cases appear in production
-
Document your patterns - regex can be cryptic months later
-
Consider libraries for complex validation - IP, email, URL parsing
-
Combine jq and grep - jq for JSON structure, grep for pattern matching
Next Module
Tool Integration - Using regex with grep, sed, awk, ripgrep, Python, and JavaScript.