Drill 06: Alternation

The alternation operator | provides OR logic in regex. Combined with grouping, it allows matching multiple alternative patterns. Understanding precedence and efficiency is key to using it correctly.

Core Concepts

Syntax Meaning Example

a|b

Match a OR b

cat|dog matches "cat" or "dog"

(a|b)c

Grouped alternation

(Mon|Tues)day matches "Monday" or "Tuesday"

(?:a|b)

Non-capturing alternation

(?:http|https):// groups without capturing

[ab]

Single character OR

[aeiou] is faster than a|e|i|o|u

Alternation vs Character Class

Use Case Alternation Character Class

Single characters

a|b|c (slower)

[abc] (faster)

Multiple characters

cat|dog (required)

N/A

Ranges

0|1|2|…​|9 (bad)

[0-9] (correct)

Negation

Complex

[^abc] (simple)

Rule: Use character classes for single-character alternatives. Use alternation for multi-character patterns.

Interactive CLI Drill

bash ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/examples/regex-drills/06-alternation.sh

Exercise Set 1: Basic Alternation

cat << 'EOF' > /tmp/ex-alt.txt
ERROR: Connection failed
Warning: Low disk space
INFO: Process started
error: lowercase error
WARNING: Memory low
WARN: Config outdated
success: Operation completed
FATAL: System crash
EOF

Ex 1.1: Match ERROR or FATAL lines

Solution
grep -E 'ERROR|FATAL' /tmp/ex-alt.txt

Output: Lines with ERROR or FATAL

Ex 1.2: Match any log level (case-sensitive)

Solution
grep -E 'ERROR|WARNING|WARN|INFO|FATAL' /tmp/ex-alt.txt

Ex 1.3: Match log levels case-insensitively

Solution
grep -Ei 'error|warning|warn|info|fatal|success' /tmp/ex-alt.txt
# Or more efficiently:
grep -Ei '(error|warn(ing)?|info|fatal|success)' /tmp/ex-alt.txt

Ex 1.4: Match ERROR but not error

Solution
grep -E '^ERROR:' /tmp/ex-alt.txt
# Combines anchor with literal match

Exercise Set 2: Grouped Alternation

cat << 'EOF' > /tmp/ex-grouped.txt
Monday meeting
Tuesday review
Wednesday standup
Thursday demo
Friday retrospective
Saturday off
Sunday off
Mondays are tough
The Monday blues
EOF

Ex 2.1: Match weekdays only (Mon-Fri)

Solution
grep -E '(Mon|Tues|Wednes|Thurs|Fri)day' /tmp/ex-grouped.txt

Grouping (Mon|Tues|Wednes|Thurs|Fri) followed by literal day.

Ex 2.2: Match weekend days

Solution
grep -E '(Satur|Sun)day' /tmp/ex-grouped.txt

Ex 2.3: Match any day with optional 's' (Mondays)

Solution
grep -E '(Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)days?' /tmp/ex-grouped.txt

The s? makes the trailing 's' optional.

Exercise Set 3: Protocol and Format Patterns

cat << 'EOF' > /tmp/ex-protocols.txt
http://example.com
https://secure.example.com
ftp://files.example.com
ssh://server.example.com
file:///local/path
http://10.50.1.100/api
https://api.example.com:443/v1
EOF

Ex 3.1: Match HTTP or HTTPS URLs

Solution
grep -E 'https?://' /tmp/ex-protocols.txt
# Or explicitly:
grep -E '(http|https)://' /tmp/ex-protocols.txt

The s? makes 's' optional - more concise than alternation for single char.

Ex 3.2: Match common protocols (http, https, ftp, ssh)

Solution
grep -E '(https?|ftp|ssh)://' /tmp/ex-protocols.txt

Ex 3.3: Extract domain from URL

Solution
grep -oP '(https?|ftp|ssh)://\K[^/:]+' /tmp/ex-protocols.txt

\K resets match to show only domain. Output: example.com, secure.example.com, etc.

Exercise Set 4: File Extensions

cat << 'EOF' > /tmp/ex-files.txt
document.pdf
report.docx
data.xlsx
image.png
photo.jpg
script.sh
config.yaml
settings.json
archive.tar.gz
backup.zip
notes.txt
code.py
EOF

Ex 4.1: Match document files (pdf, docx, xlsx)

Solution
grep -E '\.(pdf|docx?|xlsx?)$' /tmp/ex-files.txt

docx? matches doc or docx, xlsx? matches xls or xlsx.

Ex 4.2: Match image files

Solution
grep -Ei '\.(png|jpe?g|gif|bmp|svg)$' /tmp/ex-files.txt

jpe?g matches jpg or jpeg.

Ex 4.3: Match config files

Solution
grep -Ei '\.(ya?ml|json|ini|conf|cfg|toml)$' /tmp/ex-files.txt

ya?ml matches yml or yaml.

Ex 4.4: Match scripts

Solution
grep -Ei '\.(sh|bash|py|rb|pl|js|ts)$' /tmp/ex-files.txt

Exercise Set 5: Network Patterns

cat << 'EOF' > /tmp/ex-network.txt
interface GigabitEthernet0/1
interface FastEthernet0/24
interface TenGigabitEthernet1/0/1
interface Ethernet1
VLAN 10 - Data
VLAN 20 - Voice
VLAN 100 - Management
permit tcp any host 10.50.1.50 eq 443
permit udp any host 10.50.1.50 eq 53
deny tcp any any eq 23
EOF

Ex 5.1: Match interface types

Solution
grep -E '(Gigabit|Fast|TenGigabit)?Ethernet' /tmp/ex-network.txt

Ex 5.2: Match permit or deny lines

Solution
grep -E '^(permit|deny)' /tmp/ex-network.txt

Ex 5.3: Match TCP or UDP rules

Solution
grep -E '(permit|deny) (tcp|udp)' /tmp/ex-network.txt

Ex 5.4: Match common ports

Solution
grep -E 'eq (22|23|53|80|443|3389)' /tmp/ex-network.txt

Real-World Applications

Professional: ISE Log Analysis

# Match authentication outcomes
grep -E '(Passed|Failed)-(Authentication|Attempt)' /var/log/ise-psc.log

# Match ISE event types
grep -Ei '(authentication|authorization|accounting|profiling)' /var/log/ise-psc.log

# Match error codes
grep -E 'Error (11|12|13|24|27)' /var/log/ise-psc.log

Professional: Network Device Logs

# Match interface status messages
grep -E '(up|down|changed state to)' /var/log/switch.log

# Match Cisco severity levels
grep -E '%[A-Z]+-([0-3])-' /var/log/cisco.log  # Critical (0-3)

# Match spanning tree events
grep -Ei '(STP|RSTP|MSTP|BPDU)' /var/log/switch.log

Professional: Service Status

# Match systemd unit states
systemctl list-units | grep -E '(failed|inactive|activating)'

# Match running services
systemctl --type=service | grep -E '(running|exited)'

# Match critical services
systemctl status | grep -E '(sshd|nginx|docker|kubelet)'

Personal: Note Organization

# Find TODO markers
grep -ri '(TODO|FIXME|HACK|XXX):' ~/notes/

# Find priority tags
grep -Ei '#(urgent|important|priority|deadline)' ~/notes/*.md

# Find date formats
grep -E '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) [0-9]{1,2}' ~/journal/

Personal: Financial Tracking

# Match expense categories
grep -Ei '(groceries|utilities|rent|insurance|entertainment)' ~/budget/*.csv

# Match transaction types
grep -Ei '(debit|credit|transfer|payment|deposit)' ~/bank/*.txt

# Match currencies
grep -E '\$(USD|EUR|GBP|CAD|AUD)' ~/receipts/

Personal: Calendar/Schedule

# Match days of week
grep -Ei '(monday|tuesday|wednesday|thursday|friday|saturday|sunday)' ~/calendar.txt

# Match time periods
grep -Ei '(morning|afternoon|evening|night)' ~/schedule.txt

# Match meeting types
grep -Ei '(meeting|call|standup|review|demo)' ~/calendar/*.ics

Tool Variants

grep: Alternation Patterns

# Basic alternation
grep -E 'cat|dog' file.txt

# With word boundaries
grep -wE 'cat|dog' file.txt

# Count matches per alternative (use multiple greps)
echo "Cats: $(grep -c 'cat' file.txt), Dogs: $(grep -c 'dog' file.txt)"

# Case-insensitive
grep -Ei 'error|warning|critical' logs.txt

sed: Alternation in Substitution

# Replace multiple patterns with same replacement
sed -E 's/(ERROR|FATAL|CRITICAL)/[ALERT]/g' file.txt

# Add prefix to alternatives
sed -E 's/(Mon|Tues|Wednes|Thurs|Fri)day/Weekday: &/g' file.txt

# Remove any of several patterns
sed -E 's/(DEBUG|TRACE):.*//' file.txt

# Convert alternatives
sed -E 's/(http|https)/SECURE/g' urls.txt

awk: Alternation Matching

# Match lines with alternatives
awk '/(ERROR|WARN|FATAL)/' file.txt

# Count alternatives separately
awk '/ERROR/{e++} /WARN/{w++} /FATAL/{f++} END{print "E:",e,"W:",w,"F:",f}' file.txt

# Process based on match
awk '/(ERROR|FATAL)/{print "CRITICAL:", $0} /(WARN|INFO)/{print "NORMAL:", $0}' file.txt

# Field-specific matching
awk '$1 ~ /(permit|deny)/ {print}' acl.txt

vim: Alternation Patterns

" Find ERROR or FATAL
/\v(ERROR|FATAL)

" Replace log levels
:%s/\v(DEBUG|TRACE)/VERBOSE/g

" Find function or class definitions
/\v(function|class|def)\s+\w+

" Find common typos
:%s/\v(teh|hte)/the/g

" Highlight alternatives (very magic mode)
/\v(Monday|Tuesday|Wednesday|Thursday|Friday)

Python: Alternation

import re

text = """ERROR: Connection failed
WARNING: Low memory
INFO: Process started
FATAL: System crash"""

# Basic alternation
pattern = re.compile(r'(ERROR|FATAL)')
matches = pattern.findall(text)
print(matches)  # ['ERROR', 'FATAL']

# Named groups with alternation
pattern = re.compile(r'(?P<level>ERROR|WARNING|INFO|FATAL): (?P<msg>.+)')
for match in pattern.finditer(text):
    print(f"{match.group('level')}: {match.group('msg')}")

# Case-insensitive alternation
pattern = re.compile(r'error|warning|fatal', re.IGNORECASE)

# Alternation with grouping
days = re.compile(r'(Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day')

Precedence and Grouping

Alternation Has Lowest Precedence

# WRONG: Matches "gray" or "grey" NOT "gr" + "a|e" + "y"
echo "gray grey" | grep -oE 'gray|grey'

# Understanding precedence:
# 'ab|cd' means 'ab' OR 'cd', NOT 'a' + 'b|c' + 'd'

# Use grouping for partial alternation:
echo "gray grey" | grep -oE 'gr(a|e)y'
# Output: gray, grey

Common Grouping Patterns

# Optional prefix
grep -E '(un)?happy' file.txt      # "happy" or "unhappy"

# Optional suffix
grep -E 'run(ning|s)?' file.txt    # "run", "running", or "runs"

# Multiple alternatives in sequence
grep -E '(Mon|Tues|Wednes)day (morning|afternoon)' file.txt

# Nested grouping
grep -E '((http|https)://)?www\.' file.txt

Ordering Alternatives

Most Specific First

# WRONG: "light" never matches (matched by "li")
echo "light flight" | grep -oE 'li|light'
# Output: li, li

# CORRECT: Longer/specific patterns first
echo "light flight" | grep -oE 'light|li'
# Output: light

# For file extensions:
# WRONG: .doc matches first in .docx
grep -E '\.(doc|docx)' files.txt

# CORRECT: More specific first
grep -E '\.(docx|doc)' files.txt

Most Common First (Performance)

For performance, put the most likely match first:

# If INFO is most common in logs:
grep -E '(INFO|WARN|ERROR)' huge.log  # INFO checked first

# vs
grep -E '(ERROR|WARN|INFO)' huge.log  # ERROR checked first (rarely matches)

Gotchas

Forgetting to Group

# WRONG: Matches "grey" or "gray" but captures everything
echo "The grey cat" | grep -oE 'grey|gray'
# Intended: Just the color

# CORRECT with word boundary
echo "The grey cat" | grep -oE '\b(grey|gray)\b'

Alternation vs Character Class

# INEFFICIENT: Alternation for single chars
grep -E 'a|e|i|o|u' file.txt

# EFFICIENT: Character class
grep -E '[aeiou]' file.txt

# Character class is MUCH faster for large files

Escaping the Pipe in Different Contexts

# Shell quoting required
grep -E 'cat|dog' file.txt       # Works (single quotes)
grep -E "cat|dog" file.txt       # Works (double quotes, no $ in pattern)
grep -E cat\|dog file.txt        # Works (escaped)
grep -E cat|dog file.txt         # FAILS (pipe goes to shell)

# BRE requires escaping the pipe
grep 'cat\|dog' file.txt         # BRE (no -E)
grep -E 'cat|dog' file.txt       # ERE (with -E)

Empty Alternatives

# WRONG: Empty alternative matches everything
echo "test" | grep -E '|test'
# Matches: "" (empty), "test" - every position matches!

# CORRECT: Make it explicit
echo "test" | grep -E 'test|$'  # Match "test" or end of line

Key Takeaways

Concept Remember

a|b

Basic OR - match a or b

(a|b)c

Grouped alternation - match "ac" or "bc"

[ab]

Character class - faster than a|b for single chars

Order matters

More specific/longer patterns first

Precedence

Alternation is lowest - use () to group

Performance

Most common match first in long lists

BRE vs ERE

BRE: | / ERE: | (no escape with -E)

Self-Test

  1. What’s the difference between cat|dog and [cd][ao][tg]?

  2. Why does gray|grey work but you might prefer gr(a|e)y?

  3. What does (Mon|Tues)day? match?

  4. When should you use [abc] instead of a|b|c?

  5. Why put longer patterns first in alternation?

Answers
  1. cat|dog matches "cat" OR "dog"; [cd][ao][tg] matches any combination like "cat", "dog", "cot", "dag", etc.

  2. gray|grey works but gr(a|e)y is more explicit about the structure and slightly more efficient

  3. "Monday", "Tuesday", "Monda", "Tuesda" (the ? makes final y optional - probably not intended!)

  4. For single-character alternatives - character class is faster

  5. Because regex engines try left-to-right; "light|li" - "li" matches first, "light" never reached

Next Drill

Drill 07: Lookahead - Master (?=…​) positive and (?!…​) negative lookahead assertions.