Regex Session 06: awk Regex Power
awk combines the power of regex pattern matching with field-based data processing. When grep isn’t enough and sed is too limited, awk shines.
awk Basics
Syntax: awk 'pattern { action }' file
-
If pattern matches, action executes
-
Default action is
{ print } -
Default pattern is "match all lines"
Field Extraction
awk automatically splits lines into fields:
-
$0= entire line -
$1= first field -
$2= second field -
NF= number of fields -
$NF= last field
# Print second field (space-delimited by default)
echo "hello world today" | awk '{print $2}'
# Output: world
# Print last field
echo "one two three four" | awk '{print $NF}'
# Output: four
Test File Setup
cat << 'EOF' > /tmp/awk-practice.txt
# Access Log
192.168.1.100 GET /api/users 200 145ms
10.50.1.20 POST /api/login 200 89ms
172.16.0.5 GET /api/products 404 23ms
192.168.1.100 GET /api/orders 500 2341ms
10.50.1.50 DELETE /api/users/123 403 45ms
# User Data
admin:x:1000:1000:Administrator:/home/admin:/bin/bash
developer:x:1001:1001:Developer User:/home/dev:/bin/zsh
service:x:999:999:Service Account:/var/lib/service:/sbin/nologin
# Network Config
interface=eth0 ip=192.168.1.100 netmask=255.255.255.0 gateway=192.168.1.1
interface=eth1 ip=10.50.1.20 netmask=255.255.255.0 gateway=10.50.1.1
interface=lo ip=127.0.0.1 netmask=255.0.0.0
# CSV Data
Name,Email,Department,Salary
John Doe,john@example.com,Engineering,85000
Jane Smith,jane@example.com,Marketing,72000
Bob Wilson,bob@example.com,Engineering,92000
EOF
Lesson 1: Pattern Matching
# Print lines containing "GET"
awk '/GET/' /tmp/awk-practice.txt
# Print lines containing IP starting with 192
awk '/192\.168/' /tmp/awk-practice.txt
# Print lines NOT matching pattern
awk '!/^#/' /tmp/awk-practice.txt # Exclude comments
Lesson 2: Field + Pattern Combinations
# Print specific fields from matching lines
awk '/GET/ {print $1, $4}' /tmp/awk-practice.txt
# Output: IP and status code
# Conditional on field value
awk '$4 == 500 {print $1, $3}' /tmp/awk-practice.txt
# Output: IPs with 500 errors
# Numeric comparison
awk '$5 > 100 {print $0}' /tmp/awk-practice.txt
# Note: "145ms" isn't numeric - need to extract
Lesson 3: Field Separators
# Colon-separated (like /etc/passwd)
awk -F: '{print $1, $NF}' /tmp/awk-practice.txt
# Prints: username and shell
# Comma-separated (CSV)
awk -F, '/Engineering/ {print $1, $4}' /tmp/awk-practice.txt
# Multiple separators
awk -F'[=:]' '{print $1, $2}' /tmp/awk-practice.txt
Lesson 4: Regex in Field Matching
# Match field against regex using ~
awk '$1 ~ /^192/' /tmp/awk-practice.txt
# Negated match using !~
awk '$1 !~ /^#/' /tmp/awk-practice.txt
# Match multiple fields
awk '$3 ~ /\/api\// && $4 ~ /[45][0-9]{2}/' /tmp/awk-practice.txt
Lesson 5: Built-in Regex Functions
match() - Find and extract
# Find position of match
awk '{
if (match($0, /[0-9]+ms/)) {
print substr($0, RSTART, RLENGTH)
}
}' /tmp/awk-practice.txt
gsub() - Global substitution
# Replace all occurrences
awk '{gsub(/192\.168/, "10.0.0"); print}' /tmp/awk-practice.txt
# Replace in specific field
awk -F, '{gsub(/@.*/, "@COMPANY.COM", $2); print}' /tmp/awk-practice.txt
sub() - Single substitution
# Replace first occurrence only
awk '{sub(/GET/, "REQUEST"); print}' /tmp/awk-practice.txt
split() - Split by regex
# Split field by pattern
awk '{
n = split($5, arr, /[^0-9]+/)
print "Duration:", arr[1], "ms"
}' /tmp/awk-practice.txt
Lesson 6: Practical Patterns
Log Analysis
# Count requests per IP
awk '/^[0-9]/ {count[$1]++} END {for (ip in count) print ip, count[ip]}' /tmp/awk-practice.txt
# Find slow requests (>100ms)
awk '{
if (match($5, /[0-9]+/)) {
ms = substr($5, RSTART, RLENGTH)
if (ms > 100) print $1, $3, $5
}
}' /tmp/awk-practice.txt
# Count status codes
awk '/^[0-9]/ {status[$4]++} END {for (s in status) print s, status[s]}' /tmp/awk-practice.txt
Data Extraction
# Extract emails from lines
awk 'match($0, /[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+/) {
print substr($0, RSTART, RLENGTH)
}' /tmp/awk-practice.txt
# Extract IP addresses
awk 'match($0, /[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/) {
print substr($0, RSTART, RLENGTH)
}' /tmp/awk-practice.txt
Data Transformation
# Convert key=value to JSON
awk -F'[ =]' '/interface/ {
printf "{ \"interface\": \"%s\", \"ip\": \"%s\" }\n", $2, $4
}' /tmp/awk-practice.txt
# CSV to JSON-like
awk -F, 'NR>1 {
printf "{ \"name\": \"%s\", \"email\": \"%s\", \"salary\": %s }\n", $1, $2, $4
}' /tmp/awk-practice.txt
Lesson 7: awk vs grep vs sed
| Task | grep | sed | awk |
|---|---|---|---|
Find lines |
|
|
|
Extract matches |
|
Complex with hold space |
|
Replace |
grep can’t replace |
|
|
Field extraction |
grep can’t do this |
sed can’t do this |
|
Calculations |
Not possible |
Not possible |
|
Rule of thumb: - grep: Find lines (simple patterns) - sed: Replace/transform text - awk: Field extraction, calculations, complex logic
Complex Example: Log Analysis Report
awk '
BEGIN {
print "=== API Request Analysis ==="
print ""
}
/^[0-9]/ {
# Extract response time (remove "ms")
gsub(/ms/, "", $5)
# Count by status
status[$4]++
# Track slow requests
if ($5 > 100) slow++
# Sum for average
total += $5
count++
}
END {
print "Status Code Distribution:"
for (s in status) printf " %s: %d requests\n", s, status[s]
print ""
printf "Slow requests (>100ms): %d\n", slow
printf "Average response time: %.1f ms\n", total/count
}
' /tmp/awk-practice.txt
Exercises to Complete
-
[ ] Print usernames and shells from passwd-style lines
-
[ ] Find all requests that returned 4xx or 5xx
-
[ ] Calculate average salary from CSV
-
[ ] Extract all IP addresses and count occurrences
-
[ ] Convert CSV to tab-separated
Self-Check
Solutions
# 1. Usernames and shells
awk -F: '/^[a-z]/ {print $1, $NF}' /tmp/awk-practice.txt
# 2. 4xx/5xx requests
awk '$4 ~ /^[45][0-9]{2}$/ {print}' /tmp/awk-practice.txt
# 3. Average salary
awk -F, 'NR>1 && $4 ~ /[0-9]/ {sum+=$4; n++} END {print "Avg:", sum/n}' /tmp/awk-practice.txt
# 4. IP count
awk 'match($0, /[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/) {
ip = substr($0, RSTART, RLENGTH)
count[ip]++
} END {
for (ip in count) print ip, count[ip]
}' /tmp/awk-practice.txt
# 5. CSV to TSV
awk -F, 'BEGIN {OFS="\t"} {$1=$1; print}' /tmp/awk-practice.txt
Next Session
Session 07: vim Regex - Search and replace in your editor.