Alternation & Conditionals
Alternation allows matching one of several alternatives. Combined with grouping, it enables powerful pattern branching for matching varied input formats.
Basic Alternation |
The pipe | means "OR" - match the expression on the left OR the right.
Pattern: cat|dog
Text: The cat and dog played.
Matches: ^^^ ^^^
Order Matters (Sometimes)
The regex engine tries alternatives left-to-right and stops at the first match.
Pattern: cat|caterpillar
Text: caterpillar
Matches: ^^^ (matches "cat", stops there)
Pattern: caterpillar|cat
Text: caterpillar
Matches: ^^^^^^^^^^^ (matches full word)
| Put longer alternatives first when one is a prefix of another. |
Alternation Scope
Without grouping, | has the lowest precedence:
Pattern: gray|grey
Matches: "gray" OR "grey"
Pattern: gr(a|e)y
Matches: "gray" OR "grey" (more efficient)
# These are equivalent:
cat|dog|bird
(cat)|(dog)|(bird)
Grouping Limits Scope
Pattern: (Mr|Mrs|Ms)\.?\s+\w+
Matches: Mr. Smith, Mrs Smith, Ms. Jones
Without grouping:
Pattern: Mr|Mrs|Ms\.?\s+\w+
Matches: "Mr" OR "Mrs" OR "Ms. Smith" (wrong!)
Infrastructure Examples
Match Log Levels
# Match any log level
grep -E '^(DEBUG|INFO|WARN|ERROR|FATAL)' application.log
# With optional brackets
grep -E '^\[(DEBUG|INFO|WARN|ERROR|FATAL)\]' application.log
Match IP Address Types
# Private IP ranges
grep -E '^(10\.|172\.(1[6-9]|2[0-9]|3[0-1])\.|192\.168\.)' hosts.txt
# Loopback or link-local
grep -E '^(127\.|169\.254\.)' interfaces.txt
Match File Extensions
# Configuration files
ls | grep -E '\.(conf|cfg|ini|yaml|yml|json)$'
# Log files
ls | grep -E '\.(log|out|err)$'
# Certificate files
ls | grep -E '\.(pem|crt|cer|key|p12|pfx)$'
Match Protocols
# Any protocol URL
grep -E '^(https?|ftp|ssh|ldaps?)://' urls.txt
# Database connection strings
grep -E '(mysql|postgres|mongodb|redis)://' config.txt
Multiple Alternatives
You can have any number of alternatives:
Pattern: (Mon|Tue|Wed|Thu|Fri|Sat|Sun)
Matches: Any day abbreviation
Pattern: (January|February|March|April|May|June|July|August|September|October|November|December)
Matches: Any month name
Optimized with Character Classes
When alternatives differ by one character, use character classes:
# Less efficient
gr(a|e)y
# More efficient
gr[ae]y
# Less efficient
(0|1|2|3|4|5|6|7|8|9)
# More efficient
[0-9]
Alternation with Quantifiers
Pattern: (ab|cd)+
Matches: ab, cd, abab, cdcd, abcd, cdab, ababcdab
Pattern: (cat|dog)s?
Matches: cat, cats, dog, dogs
Optional Variants
# Color/colour
grep -E 'colou?r' document.txt
# Gray/grey
grep -E 'gr[ae]y' document.txt
# Analyze/analyse
grep -E 'analy[sz]e' document.txt
Non-Capturing Alternation (?:)
When you only need to match but not capture:
# Capturing (creates group)
Pattern: (https?|ftp)://
# Non-capturing (just matching)
Pattern: (?:https?|ftp)://
# Match protocol, don't capture it, capture the domain
grep -oP '(?:https?|ftp)://([^/]+)' urls.txt
Conditional Patterns (Advanced)
PCRE supports conditional matching based on whether a group matched.
Syntax: (?(condition)yes|no)
# If group 1 matched, require "b", else require "c"
Pattern: (a)?(?(1)b|c)
Matches: "ab" (a present, requires b)
"c" (a absent, requires c)
Does not match: "ac", "b"
Practical Example: Optional Quotes
# Match quoted or unquoted value
Pattern: (")?value(?(1)"|)
Matches: value
"value"
Conditional with Named Groups
Pattern: (?<quote>")?value(?(quote)"|)
import re
# Match optionally quoted values
pattern = r'(?P<quote>")?(?P<value>\w+)(?(quote)"|)'
for test in ['"hello"', 'hello', '"test']:
match = re.match(pattern, test)
if match:
print(f"'{test}' -> '{match.group('value')}'")
else:
print(f"'{test}' -> no match")
# Output:
# '"hello"' -> 'hello'
# 'hello' -> 'hello'
# '"test' -> no match
Branch Reset Groups (?|) (PCRE)
Branch reset makes all alternatives share the same group numbers.
The Problem
Normal alternation creates separate groups:
Pattern: (a)|(b)|(c)
Text: b
Group 1: None (didn't match)
Group 2: b
Group 3: None (didn't match)
The Solution
Branch reset (?|) resets numbering for each branch:
Pattern: (?|(a)|(b)|(c))
Text: b
Group 1: b (all alternatives use group 1)
Practical Example
# Without branch reset - messy
Pattern: (\d{4})-(\d{2})-(\d{2})|(\d{2})/(\d{2})/(\d{4})
# Group 1-3 for YYYY-MM-DD, Group 4-6 for MM/DD/YYYY
# With branch reset - clean
Pattern: (?|(\d{4})-(\d{2})-(\d{2})|(\d{2})/(\d{2})/(\d{4}))
# Group 1-3 for both formats
import regex # Requires 'regex' module, not 're'
pattern = r'(?|(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})|(?P<month>\d{2})/(?P<day>\d{2})/(?P<year>\d{4}))'
for date in ['2026-03-15', '03/15/2026']:
match = regex.match(pattern, date)
if match:
print(f"{date} -> Year: {match.group('year')}, Month: {match.group('month')}, Day: {match.group('day')}")
Self-Test Exercises
| Try each challenge FIRST. Only expand the answer after you’ve attempted it. |
Setup Test Data
cat << 'EOF' > /tmp/alternation.txt
DEBUG: Starting service
INFO: Service started
WARN: Low memory
ERROR: Connection failed
FATAL: System crash
Status: OK
Status: FAIL
Status: PENDING
http://example.com
https://secure.example.com
ftp://files.example.com
Color: gray
Colour: grey
File: config.yaml
File: settings.json
File: app.conf
Date: 2026-03-15
Date: 03/15/2026
IP: 192.168.1.100
IP: 10.0.0.1
IP: 172.16.0.1
cat
dog
caterpillar
EOF
Challenge 1: Match ERROR or FATAL
Goal: Find lines starting with ERROR or FATAL
Answer
grep -E '^(ERROR|FATAL)' /tmp/alternation.txt
| means OR. Group with () to limit scope to just the words.
Challenge 2: Match All Log Levels
Goal: Find lines starting with any log level (DEBUG, INFO, WARN, ERROR, FATAL)
Answer
grep -E '^(DEBUG|INFO|WARN|ERROR|FATAL)' /tmp/alternation.txt
Multiple alternatives separated by |.
Challenge 3: Match Status Values
Goal: Find lines with Status: followed by OK, FAIL, or PENDING
Answer
grep -E 'Status: (OK|FAIL|PENDING)' /tmp/alternation.txt
The space after colon ensures we match the full pattern.
Challenge 4: Match HTTP or HTTPS URLs
Goal: Match URLs starting with http:// or https://
Answer
# Using alternation
grep -E '(http|https)://' /tmp/alternation.txt
# Using ? for optional s
grep -E 'https?://' /tmp/alternation.txt
https? is more concise - ? makes the s optional.
Challenge 5: Match Any Protocol URL
Goal: Match http://, https://, or ftp:// URLs
Answer
grep -E '(https?|ftp)://' /tmp/alternation.txt
https? handles both http and https, then |ftp adds the third option.
Challenge 6: Match Gray/Grey
Goal: Match both American "gray" and British "grey" spellings
Answer
# Using alternation
grep -E 'gr(a|e)y' /tmp/alternation.txt
# Using character class (more efficient)
grep -E 'gr[ae]y' /tmp/alternation.txt
Character class [ae] is more efficient than alternation for single characters.
Challenge 7: Match Config File Extensions
Goal: Find files ending with .yaml, .json, or .conf
Answer
grep -E '\.(yaml|json|conf)$' /tmp/alternation.txt
\. escapes the dot, $ anchors to end of line.
Challenge 8: Match Private IP Ranges
Goal: Match IPs in private ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x)
Answer
grep -E '^IP: (10\.|172\.(1[6-9]|2[0-9]|3[0-1])\.|192\.168\.)' /tmp/alternation.txt
Complex alternation with nested groups for the 172.16-31 range.
Challenge 9: Order Matters - Cat vs Caterpillar
Goal: Match "caterpillar" fully (not just "cat" part of it)
Answer
# Wrong order - "cat" matches first, stops there
grep -oE 'cat|caterpillar' /tmp/alternation.txt
# Correct - longer alternative first
grep -oE 'caterpillar|cat' /tmp/alternation.txt
# Or use word boundary
grep -oE '\b(cat|caterpillar)\b' /tmp/alternation.txt
Put longer alternatives FIRST when one is a prefix of another.
Challenge 10: Match Date Formats
Goal: Match dates in either YYYY-MM-DD or MM/DD/YYYY format
Answer
grep -E '([0-9]{4}-[0-9]{2}-[0-9]{2}|[0-9]{2}/[0-9]{2}/[0-9]{4})' /tmp/alternation.txt
Two alternatives for two date formats.
Challenge 11: Match Cat or Dog
Goal: Match whole words "cat" or "dog" (not "caterpillar")
Answer
grep -E '\b(cat|dog)\b' /tmp/alternation.txt
\b word boundaries ensure whole word match.
Challenge 12: Non-Capturing Alternation
Goal: Match URLs but don’t capture the protocol (use non-capturing group)
Answer
grep -oP '(?:https?|ftp)://\S+' /tmp/alternation.txt
(?:…) groups without capturing. Output includes full URL.
Common Patterns
Boolean Values
Pattern: (true|false|yes|no|on|off|1|0)
Case-insensitive: (?i)(true|false|yes|no|on|off|1|0)
HTTP Methods
Pattern: (GET|POST|PUT|DELETE|PATCH|HEAD|OPTIONS)
Status Codes
Pattern: (2[0-9]{2}|3[0-9]{2}|4[0-9]{2}|5[0-9]{2})
# Or simpler: [2-5][0-9]{2}
Common Protocols
Pattern: (https?|ftp|ssh|telnet|ldaps?|smb|nfs)://
Day Names
Short: (Mon|Tue|Wed|Thu|Fri|Sat|Sun)
Full: (Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)
Common Mistakes
Mistake 1: Forgetting Grouping
# Wrong - matches "cat" or "dog food"
Pattern: cat|dog food
# Correct - matches "cat food" or "dog food"
Pattern: (cat|dog) food
Mistake 2: Inefficient Alternatives
# Inefficient
grep -E '(a|b|c|d|e)' file.txt
# Efficient
grep -E '[a-e]' file.txt
Mistake 3: Wrong Alternative Order
# Wrong - "Jan" matches before "January"
Pattern: (Jan|January)
# Correct - longer first
Pattern: (January|Jan)
# Or use word boundary
Pattern: \b(Jan|January)\b
Key Takeaways
-
|means OR - match left or right expression -
Use
()to group alternatives - control scope -
Order matters - put longer alternatives first
-
Use
(?:)for non-capturing - when you don’t need the match -
Character classes are more efficient -
[abc]vs(a|b|c) -
Branch reset
(?|)shares group numbers - PCRE advanced feature
Next Module
Lookahead & Lookbehind - Zero-width assertions for conditional matching.