Alternation & Conditionals

Alternation allows matching one of several alternatives. Combined with grouping, it enables powerful pattern branching for matching varied input formats.

Basic Alternation |

The pipe | means "OR" - match the expression on the left OR the right.

Pattern: cat|dog
Text:    The cat and dog played.
Matches:     ^^^     ^^^

Order Matters (Sometimes)

The regex engine tries alternatives left-to-right and stops at the first match.

Pattern: cat|caterpillar
Text:    caterpillar
Matches: ^^^ (matches "cat", stops there)

Pattern: caterpillar|cat
Text:    caterpillar
Matches: ^^^^^^^^^^^ (matches full word)
Put longer alternatives first when one is a prefix of another.

Alternation Scope

Without grouping, | has the lowest precedence:

Pattern: gray|grey
Matches: "gray" OR "grey"

Pattern: gr(a|e)y
Matches: "gray" OR "grey" (more efficient)

# These are equivalent:
cat|dog|bird
(cat)|(dog)|(bird)

Grouping Limits Scope

Pattern: (Mr|Mrs|Ms)\.?\s+\w+
Matches: Mr. Smith, Mrs Smith, Ms. Jones

Without grouping:
Pattern: Mr|Mrs|Ms\.?\s+\w+
Matches: "Mr" OR "Mrs" OR "Ms. Smith" (wrong!)

Infrastructure Examples

Match Log Levels

# Match any log level
grep -E '^(DEBUG|INFO|WARN|ERROR|FATAL)' application.log

# With optional brackets
grep -E '^\[(DEBUG|INFO|WARN|ERROR|FATAL)\]' application.log

Match IP Address Types

# Private IP ranges
grep -E '^(10\.|172\.(1[6-9]|2[0-9]|3[0-1])\.|192\.168\.)' hosts.txt

# Loopback or link-local
grep -E '^(127\.|169\.254\.)' interfaces.txt

Match File Extensions

# Configuration files
ls | grep -E '\.(conf|cfg|ini|yaml|yml|json)$'

# Log files
ls | grep -E '\.(log|out|err)$'

# Certificate files
ls | grep -E '\.(pem|crt|cer|key|p12|pfx)$'

Match Protocols

# Any protocol URL
grep -E '^(https?|ftp|ssh|ldaps?)://' urls.txt

# Database connection strings
grep -E '(mysql|postgres|mongodb|redis)://' config.txt

Multiple Alternatives

You can have any number of alternatives:

Pattern: (Mon|Tue|Wed|Thu|Fri|Sat|Sun)
Matches: Any day abbreviation

Pattern: (January|February|March|April|May|June|July|August|September|October|November|December)
Matches: Any month name

Optimized with Character Classes

When alternatives differ by one character, use character classes:

# Less efficient
gr(a|e)y

# More efficient
gr[ae]y

# Less efficient
(0|1|2|3|4|5|6|7|8|9)

# More efficient
[0-9]

Alternation with Quantifiers

Pattern: (ab|cd)+
Matches: ab, cd, abab, cdcd, abcd, cdab, ababcdab

Pattern: (cat|dog)s?
Matches: cat, cats, dog, dogs

Optional Variants

# Color/colour
grep -E 'colou?r' document.txt

# Gray/grey
grep -E 'gr[ae]y' document.txt

# Analyze/analyse
grep -E 'analy[sz]e' document.txt

Non-Capturing Alternation (?:)

When you only need to match but not capture:

# Capturing (creates group)
Pattern: (https?|ftp)://

# Non-capturing (just matching)
Pattern: (?:https?|ftp)://
# Match protocol, don't capture it, capture the domain
grep -oP '(?:https?|ftp)://([^/]+)' urls.txt

Conditional Patterns (Advanced)

PCRE supports conditional matching based on whether a group matched.

Syntax: (?(condition)yes|no)

# If group 1 matched, require "b", else require "c"
Pattern: (a)?(?(1)b|c)
Matches: "ab" (a present, requires b)
         "c"  (a absent, requires c)
Does not match: "ac", "b"

Practical Example: Optional Quotes

# Match quoted or unquoted value
Pattern: (")?value(?(1)"|)
Matches: value
         "value"

Conditional with Named Groups

Pattern: (?<quote>")?value(?(quote)"|)
import re

# Match optionally quoted values
pattern = r'(?P<quote>")?(?P<value>\w+)(?(quote)"|)'

for test in ['"hello"', 'hello', '"test']:
    match = re.match(pattern, test)
    if match:
        print(f"'{test}' -> '{match.group('value')}'")
    else:
        print(f"'{test}' -> no match")

# Output:
# '"hello"' -> 'hello'
# 'hello' -> 'hello'
# '"test' -> no match

Branch Reset Groups (?|) (PCRE)

Branch reset makes all alternatives share the same group numbers.

The Problem

Normal alternation creates separate groups:

Pattern: (a)|(b)|(c)
Text:    b

Group 1: None (didn't match)
Group 2: b
Group 3: None (didn't match)

The Solution

Branch reset (?|) resets numbering for each branch:

Pattern: (?|(a)|(b)|(c))
Text:    b

Group 1: b (all alternatives use group 1)

Practical Example

# Without branch reset - messy
Pattern: (\d{4})-(\d{2})-(\d{2})|(\d{2})/(\d{2})/(\d{4})
# Group 1-3 for YYYY-MM-DD, Group 4-6 for MM/DD/YYYY

# With branch reset - clean
Pattern: (?|(\d{4})-(\d{2})-(\d{2})|(\d{2})/(\d{2})/(\d{4}))
# Group 1-3 for both formats
import regex  # Requires 'regex' module, not 're'

pattern = r'(?|(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})|(?P<month>\d{2})/(?P<day>\d{2})/(?P<year>\d{4}))'

for date in ['2026-03-15', '03/15/2026']:
    match = regex.match(pattern, date)
    if match:
        print(f"{date} -> Year: {match.group('year')}, Month: {match.group('month')}, Day: {match.group('day')}")

Self-Test Exercises

Try each challenge FIRST. Only expand the answer after you’ve attempted it.

Setup Test Data

cat << 'EOF' > /tmp/alternation.txt
DEBUG: Starting service
INFO: Service started
WARN: Low memory
ERROR: Connection failed
FATAL: System crash
Status: OK
Status: FAIL
Status: PENDING
http://example.com
https://secure.example.com
ftp://files.example.com
Color: gray
Colour: grey
File: config.yaml
File: settings.json
File: app.conf
Date: 2026-03-15
Date: 03/15/2026
IP: 192.168.1.100
IP: 10.0.0.1
IP: 172.16.0.1
cat
dog
caterpillar
EOF

Challenge 1: Match ERROR or FATAL

Goal: Find lines starting with ERROR or FATAL

Answer
grep -E '^(ERROR|FATAL)' /tmp/alternation.txt

| means OR. Group with () to limit scope to just the words.


Challenge 2: Match All Log Levels

Goal: Find lines starting with any log level (DEBUG, INFO, WARN, ERROR, FATAL)

Answer
grep -E '^(DEBUG|INFO|WARN|ERROR|FATAL)' /tmp/alternation.txt

Multiple alternatives separated by |.


Challenge 3: Match Status Values

Goal: Find lines with Status: followed by OK, FAIL, or PENDING

Answer
grep -E 'Status: (OK|FAIL|PENDING)' /tmp/alternation.txt

The space after colon ensures we match the full pattern.


Challenge 4: Match HTTP or HTTPS URLs

Goal: Match URLs starting with http:// or https://

Answer
# Using alternation
grep -E '(http|https)://' /tmp/alternation.txt

# Using ? for optional s
grep -E 'https?://' /tmp/alternation.txt

https? is more concise - ? makes the s optional.


Challenge 5: Match Any Protocol URL

Goal: Match http://, https://, or ftp:// URLs

Answer
grep -E '(https?|ftp)://' /tmp/alternation.txt

https? handles both http and https, then |ftp adds the third option.


Challenge 6: Match Gray/Grey

Goal: Match both American "gray" and British "grey" spellings

Answer
# Using alternation
grep -E 'gr(a|e)y' /tmp/alternation.txt

# Using character class (more efficient)
grep -E 'gr[ae]y' /tmp/alternation.txt

Character class [ae] is more efficient than alternation for single characters.


Challenge 7: Match Config File Extensions

Goal: Find files ending with .yaml, .json, or .conf

Answer
grep -E '\.(yaml|json|conf)$' /tmp/alternation.txt

\. escapes the dot, $ anchors to end of line.


Challenge 8: Match Private IP Ranges

Goal: Match IPs in private ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x)

Answer
grep -E '^IP: (10\.|172\.(1[6-9]|2[0-9]|3[0-1])\.|192\.168\.)' /tmp/alternation.txt

Complex alternation with nested groups for the 172.16-31 range.


Challenge 9: Order Matters - Cat vs Caterpillar

Goal: Match "caterpillar" fully (not just "cat" part of it)

Answer
# Wrong order - "cat" matches first, stops there
grep -oE 'cat|caterpillar' /tmp/alternation.txt

# Correct - longer alternative first
grep -oE 'caterpillar|cat' /tmp/alternation.txt

# Or use word boundary
grep -oE '\b(cat|caterpillar)\b' /tmp/alternation.txt

Put longer alternatives FIRST when one is a prefix of another.


Challenge 10: Match Date Formats

Goal: Match dates in either YYYY-MM-DD or MM/DD/YYYY format

Answer
grep -E '([0-9]{4}-[0-9]{2}-[0-9]{2}|[0-9]{2}/[0-9]{2}/[0-9]{4})' /tmp/alternation.txt

Two alternatives for two date formats.


Challenge 11: Match Cat or Dog

Goal: Match whole words "cat" or "dog" (not "caterpillar")

Answer
grep -E '\b(cat|dog)\b' /tmp/alternation.txt

\b word boundaries ensure whole word match.


Challenge 12: Non-Capturing Alternation

Goal: Match URLs but don’t capture the protocol (use non-capturing group)

Answer
grep -oP '(?:https?|ftp)://\S+' /tmp/alternation.txt

(?:…​) groups without capturing. Output includes full URL.

Common Patterns

Boolean Values

Pattern: (true|false|yes|no|on|off|1|0)
Case-insensitive: (?i)(true|false|yes|no|on|off|1|0)

HTTP Methods

Pattern: (GET|POST|PUT|DELETE|PATCH|HEAD|OPTIONS)

Status Codes

Pattern: (2[0-9]{2}|3[0-9]{2}|4[0-9]{2}|5[0-9]{2})
# Or simpler: [2-5][0-9]{2}

Common Protocols

Pattern: (https?|ftp|ssh|telnet|ldaps?|smb|nfs)://

Day Names

Short: (Mon|Tue|Wed|Thu|Fri|Sat|Sun)
Full:  (Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)

Common Mistakes

Mistake 1: Forgetting Grouping

# Wrong - matches "cat" or "dog food"
Pattern: cat|dog food

# Correct - matches "cat food" or "dog food"
Pattern: (cat|dog) food

Mistake 2: Inefficient Alternatives

# Inefficient
grep -E '(a|b|c|d|e)' file.txt

# Efficient
grep -E '[a-e]' file.txt

Mistake 3: Wrong Alternative Order

# Wrong - "Jan" matches before "January"
Pattern: (Jan|January)

# Correct - longer first
Pattern: (January|Jan)

# Or use word boundary
Pattern: \b(Jan|January)\b

Key Takeaways

  1. | means OR - match left or right expression

  2. Use () to group alternatives - control scope

  3. Order matters - put longer alternatives first

  4. Use (?:) for non-capturing - when you don’t need the match

  5. Character classes are more efficient - [abc] vs (a|b|c)

  6. Branch reset (?|) shares group numbers - PCRE advanced feature

Next Module

Lookahead & Lookbehind - Zero-width assertions for conditional matching.