Drill 02: Character Classes
Character classes let you match any character from a defined set. This is how you match "any digit" or "any letter" without listing every possibility.
Core Concepts
| Syntax | Meaning | Example |
|---|---|---|
|
Match a, b, OR c |
|
|
Range: a through z |
|
|
Range: any digit |
|
|
NOT a, b, or c |
|
|
Combined ranges |
Any letter |
PCRE Shorthand (grep -P)
| Shorthand | Equivalent | Negated |
|---|---|---|
|
|
|
|
|
|
|
|
|
Interactive CLI Drill
bash ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/examples/regex-drills/02-character-classes.sh
Exercise Set 1: Basic Classes
cat << 'EOF' > /tmp/ex-cc.txt
192.168.1.100
AA:BB:CC:DD:EE:FF
14:f6:d8:7b:31:80
Hello World
HELLO WORLD
hello world
user_name_123
port8080
2026-03-15
VLAN 100
EOF
Ex 1.1: Extract all digits
Solution
grep -Eo '[0-9]+' /tmp/ex-cc.txt
Output: 192, 168, 1, 100, 14, 6, 8, 7, 31, 80, …
Ex 1.2: Extract only lowercase words
Solution
grep -Eo '[a-z]+' /tmp/ex-cc.txt
Output: f, d, b, ello, orld, hello, world, user, name, port
Ex 1.3: Extract hex characters (for MACs)
Solution
grep -Eio '[A-Fa-f0-9]+' /tmp/ex-cc.txt
Ex 1.4: Extract uppercase words only
Solution
grep -Eo '[A-Z]+' /tmp/ex-cc.txt
Output: AA, BB, CC, DD, EE, FF, HELLO, WORLD, VLAN
Exercise Set 2: Negated Classes
Ex 2.1: Everything except digits
Solution
grep -Eo '[^0-9]+' /tmp/ex-cc.txt | head -10
This extracts non-numeric parts: dots, colons, letters, etc.
Ex 2.2: Consonants only (not vowels)
Solution
echo "Hello World" | grep -Eo '[^aeiouAEIOU ]+'
Output: Hll, Wrld
Ex 2.3: Non-whitespace tokens
Solution
echo " word1 word2 word3 " | grep -Po '\S+'
Output: word1, word2, word3
Exercise Set 3: Combining Classes
Ex 3.1: MAC address octets
Solution
# Two hex chars followed by colon
grep -Eio '[0-9A-F]{2}:' /tmp/ex-cc.txt | head -10
Ex 3.2: Snake_case identifiers
Solution
grep -Eo '[a-z]+_[a-z]+_[0-9]+' /tmp/ex-cc.txt
Output: user_name_123
Ex 3.3: Port numbers attached to words
Solution
grep -Eo '[a-z]+[0-9]+' /tmp/ex-cc.txt
Output: f6, d8, b31, port8080 (includes MAC parts too)
Ex 3.4: Just "port" followed by number
Solution
grep -Eo 'port[0-9]+' /tmp/ex-cc.txt
Output: port8080
Exercise Set 4: PCRE Shorthand
Ex 4.1: Extract all numbers with \d
Solution
grep -Po '\d+' /tmp/ex-cc.txt
Ex 4.2: Word characters with \w
Solution
grep -Po '\w+' /tmp/ex-cc.txt | head -10
Note: \w includes underscore, so user_name_123 is one match
Ex 4.3: Find lines with multiple spaces
Solution
echo "normal double triple" | grep -P '\s{2,}'
\s{2,} = two or more whitespace characters
Real-World Applications
Professional: Extract IPs from Logs
# Basic IP pattern
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/syslog
Professional: Extract MAC Addresses
# Full MAC address (colon format)
grep -Eio '[0-9A-F]{2}(:[0-9A-F]{2}){5}' /var/log/ise-psc.log
Professional: VLAN Numbers
# VLAN followed by number
grep -Poi 'vlan\s*\d+' /etc/network/interfaces
Personal: Extract Dates
# ISO dates from notes
grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}' ~/journal/*.md
Personal: Find Phone Numbers
# US phone number patterns
grep -Eo '[0-9]{3}-[0-9]{3}-[0-9]{4}' ~/contacts.txt
Tool Variants
sed: Character Class Substitution
# Remove all digits
echo "port8080" | sed 's/[0-9]//g'
# Output: port
# Remove all non-letters
echo "user_123_name" | sed 's/[^a-zA-Z]//g'
# Output: username
awk: Pattern Matching
# Print lines containing hex characters
awk '/[A-Fa-f]/' /tmp/ex-cc.txt
# Extract field with digits only
echo "name:123:value" | awk -F: '$2 ~ /^[0-9]+$/ {print $2}'
vim: Character Class Search
" Find all hex sequences /[0-9A-Fa-f]\+ " Find non-alphanumeric /[^A-Za-z0-9] " Delete trailing whitespace :%s/[[:space:]]\+$//g
Python: Character Classes
import re
text = "Server IP: 192.168.1.100, MAC: AA:BB:CC:DD:EE:FF"
# Extract all numbers
numbers = re.findall(r'[0-9]+', text)
print(numbers) # ['192', '168', '1', '100']
# Extract hex octets
hex_octets = re.findall(r'[A-Fa-f0-9]{2}', text)
print(hex_octets) # ['19', '16', '10', 'AA', 'BB', ...]
Gotchas
Dash in Character Class
# WRONG: dash creates range
[a-z0-9-] # This is confusing
# CORRECT: put dash first or last
[-a-z0-9] # Dash as literal
[a-z0-9-] # Dash at end (also works)
Caret Position
# ^ at START negates the class
[^abc] # NOT a, b, or c
# ^ elsewhere is literal
[a^bc] # Matches a, ^, b, or c
POSIX Double Brackets
# WRONG
grep -E '[:digit:]' file # Matches : d i g t only!
# CORRECT
grep -E '[[:digit:]]' file # Matches digits
Key Takeaways
| Pattern | Use Case |
|---|---|
|
Single digit |
|
One or more digits (a number) |
|
Any letter |
|
Hex character |
|
Anything except a, b, c |
|
Shorthand for |
|
Word character (letters, digits, underscore) |
Self-Test
-
What does
[^0-9]match? -
What’s the PCRE shorthand for
[0-9]? -
How do you match a literal dash in a character class?
-
What’s wrong with
[:alpha:]in grep? -
Does
[A-z]include only letters?
Next Drill
Drill 03: Quantifiers - Master *, +, ?, {n,m}, and greedy vs lazy.