Character Classes
Character classes define a set of characters, any one of which can match at that position. They are fundamental to matching variable content like digits, letters, or specific character sets.
Basic Character Classes
Syntax: Square Brackets
A character class is enclosed in square brackets []. Any single character inside the brackets can match.
Pattern: [aeiou]
Matches: Any single vowel (a, e, i, o, or u)
Pattern: [0123456789]
Matches: Any single digit
Character Ranges
Use hyphen - to specify ranges.
| Pattern | Matches | Description |
|---|---|---|
|
a through z |
Lowercase letters |
|
A through Z |
Uppercase letters |
|
0 through 9 |
Digits |
|
a-z, A-Z |
All letters |
|
a-z, A-Z, 0-9 |
Alphanumeric |
|
Hex characters |
Hexadecimal digits |
Combining Ranges and Literals
Pattern: [a-zA-Z_][a-zA-Z0-9_]*
Meaning: Valid identifier (starts with letter/underscore, followed by alphanumeric/underscore)
Matches: myVar, _private, count123, MAX_VALUE
Negated Character Classes
Caret ^ at the START of a class negates it.
| Pattern | Matches |
|---|---|
|
Any character EXCEPT digits |
|
Any non-letter |
|
Any non-vowel |
|
Any non-whitespace |
Important: ^ only negates when it’s the FIRST character inside [].
[^abc] → NOT a, b, or c
[a^bc] → a, ^, b, or c (^ is literal here)
Special Characters Inside Classes
Most metacharacters lose their special meaning inside []:
| Character | Inside [] |
Notes |
|---|---|---|
|
Literal dot |
No need to escape |
|
Literal asterisk |
No need to escape |
|
Literal plus |
No need to escape |
|
Literal question mark |
No need to escape |
|
Negation (if first) or literal |
Escape or place not-first |
|
Range operator |
Escape or place first/last |
|
Ends class |
Escape or place first |
|
Escape character |
Still works |
Matching Literal Hyphen, Caret, Bracket
# Hyphen at end (literal)
[a-z-]
# Hyphen at start (literal)
[-a-z]
# Hyphen escaped
[a-z\-0-9]
# Caret not first (literal)
[a-z^]
# Closing bracket first (literal)
[]a-z]
Shorthand Character Classes
PCRE and most modern engines provide shorthand:
| Shorthand | Longhand | Matches | Notes |
|---|---|---|---|
|
|
Digit |
PCRE only |
|
|
Non-digit |
PCRE only |
|
|
Word character |
PCRE only |
|
|
Non-word character |
PCRE only |
|
|
Whitespace |
PCRE only |
|
|
Non-whitespace |
PCRE only |
Availability:
- grep -P, ripgrep, Python, JavaScript, Perl: YES
- grep, grep -E, sed, awk: NO (use longhand)
Using Shorthand in Practice
# PCRE: Match IP address
grep -P '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}' file.txt
# ERE equivalent (no shorthand)
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file.txt
POSIX Character Classes
POSIX defines named classes (must be inside another []):
| POSIX Class | Equivalent | Matches |
|---|---|---|
|
Letters |
|
|
Digits |
|
|
Alphanumeric |
|
|
Whitespace |
|
|
Uppercase |
|
|
Lowercase |
|
Punctuation |
|
|
|
Hex digits |
Usage:
# Note: Double brackets required
grep '[[:digit:]]' file.txt
grep '[[:alpha:]][[:digit:]]' file.txt
Infrastructure Patterns
Hex Characters (MAC addresses, hashes)
Pattern: [A-Fa-f0-9]
Use: Single hex digit
Pattern: [A-Fa-f0-9]{2}
Use: Hex byte (AA, FF, 0a)
Pattern: [A-Fa-f0-9]{32}
Use: MD5 hash
Pattern: [A-Fa-f0-9]{64}
Use: SHA-256 hash
Port Numbers
Pattern: [0-9]{1,5}
Use: 1-5 digit number (0-99999)
Note: Doesn't validate range (65535 max)
Usernames
Pattern: [a-z][a-z0-9_-]{2,31}
Use: Linux username rules
- Starts with lowercase letter
- 3-32 characters
- Lowercase, digits, underscore, hyphen
Log Levels
Pattern: [DIWEF][A-Z]+
Use: Matches DEBUG, INFO, WARN, ERROR, FATAL
- First letter distinguishes level
Self-Test Exercises
| Try each challenge FIRST. Only expand the answer after you’ve attempted it. |
Setup Test Data
cat << 'EOF' > /tmp/classes.txt
User: admin123
User: _system
User: 123invalid
MAC: AA:BB:CC:DD:EE:FF
MAC: aa:bb:cc:dd:ee:ff
MAC: GG:HH:II:JJ:KK:LL
Hash: 5d41402abc4b2a76b9719d911017c592
Port: 443
Port: 80
Port: 99999
Level: INFO
Level: ERROR
Level: DEBUG
Temperature: -5C
Temperature: 25C
VLAN: 10
VLAN: 999
IP: 192.168.1.100
IP: 10.50.1.20
EOF
Challenge 1: Match Any Digit
Goal: Find lines containing any digit 0-9
Answer
grep '[0-9]' /tmp/classes.txt
[0-9] matches any single digit
Challenge 2: Match Hex Characters Only
Goal: Extract valid hex octets (AA, BB, aa, ff, etc.) but NOT invalid ones (GG, HH)
Answer
grep -oE '[A-Fa-f0-9]{2}' /tmp/classes.txt
[A-Fa-f0-9] matches hex chars only (A-F, a-f, 0-9)
Challenge 3: Valid MAC Address
Goal: Match only valid MAC addresses (hex chars, not GG:HH:II…)
Answer
grep -E '([A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}' /tmp/classes.txt
This won’t match GG:HH:II:JJ:KK:LL because G, H, I aren’t hex
Challenge 4: Username Starting with Letter
Goal: Find usernames that start with a letter (valid)
Answer
grep -E 'User: [a-zA-Z]' /tmp/classes.txt
[a-zA-Z] matches any letter
Challenge 5: Username Starting with Digit (Invalid)
Goal: Find usernames that start with a digit (invalid)
Answer
grep -E 'User: [0-9]' /tmp/classes.txt
Output: User: 123invalid
Challenge 6: Match NOT a Digit
Goal: Find lines where the character after "User: " is NOT a digit
Answer
grep -E 'User: [^0-9]' /tmp/classes.txt
[^0-9] = negated class = NOT a digit
Challenge 7: Extract Port Numbers
Goal: Extract just the port numbers (443, 80, 99999)
Answer
grep -oE 'Port: [0-9]+' /tmp/classes.txt | grep -oE '[0-9]+'
Or with lookbehind (PCRE):
grep -oP '(?<=Port: )[0-9]+' /tmp/classes.txt
Challenge 8: Log Levels (Uppercase Words)
Goal: Extract log levels (INFO, ERROR, DEBUG)
Answer
grep -oE 'Level: [A-Z]+' /tmp/classes.txt
[A-Z]+ = one or more uppercase letters
Challenge 9: Negative Numbers
Goal: Match temperatures including negative values (-5C, 25C)
Answer
grep -oE '-?[0-9]+C' /tmp/classes.txt
-? = optional minus sign
Challenge 10: PCRE vs ERE
Goal: Match digits using PCRE shorthand, then using ERE equivalent
Answer
# PCRE shorthand (grep -P required)
grep -oP '\d+' /tmp/classes.txt
# ERE equivalent (works everywhere)
grep -oE '[0-9]+' /tmp/classes.txt
\d = [0-9] but only works with -P
Challenge 11: Alphanumeric Only
Goal: Match the hash (alphanumeric characters only)
Answer
grep -oE '[a-f0-9]{32}' /tmp/classes.txt
MD5 hash is 32 hex characters
Challenge 12: VLAN Range
Goal: Match VLAN IDs (1-3 digit numbers after "VLAN: ")
Answer
grep -oE 'VLAN: [0-9]{1,4}' /tmp/classes.txt
{1,4} limits to 1-4 digits (VLANs go up to 4094)
Common Mistakes
Mistake 1: Unescaped Hyphen
# Wrong - hyphen creates range
[a-z-0-9] # Undefined behavior
# Correct - hyphen at end
[a-z0-9-]
# Correct - hyphen escaped
[a-z\-0-9]
Mistake 2: Shorthand in Wrong Engine
# Fails in basic grep/awk
grep '\d+' file.txt # \d not recognized
# Works in PCRE
grep -P '\d+' file.txt
# Works everywhere
grep -E '[0-9]+' file.txt
Mistake 3: Forgetting Double Brackets for POSIX
# Wrong
grep '[:digit:]' file.txt # Matches ':', 'd', 'i', etc.
# Correct
grep '[[:digit:]]' file.txt # Matches digits
Next Module
Quantifiers - Specifying repetition with *, +, ?, and {n,m}.