Drill 02: Character Classes

Character classes let you match any character from a defined set. This is how you match "any digit" or "any letter" without listing every possibility.

Core Concepts

Syntax Meaning Example

Syntax	Meaning	Example
`[abc]`	Match a, b, OR c	`[aeiou]` = any vowel
`[a-z]`	Range: a through z	`[A-Z]` = uppercase
`[0-9]`	Range: any digit	`[0-9]+` = number
`[^abc]`	NOT a, b, or c	`[^0-9]` = non-digit
`[a-zA-Z]`	Combined ranges	Any letter

[abc]

Match a, b, OR c

[aeiou] = any vowel

[a-z]

Range: a through z

[A-Z] = uppercase

[0-9]

Range: any digit

[0-9]+ = number

[^abc]

NOT a, b, or c

[^0-9] = non-digit

[a-zA-Z]

Combined ranges

Any letter

PCRE Shorthand (grep -P)

Shorthand Equivalent Negated

Shorthand	Equivalent	Negated
`\d`	`[0-9]`	`\D` = `[^0-9]`
`\w`	`[A-Za-z0-9_]`	`\W` = `[^A-Za-z0-9_]`
`\s`	`[ \t\n\r\f]`	`\S` = non-whitespace

\d

[0-9]

\D = [^0-9]

\w

[A-Za-z0-9_]

\W = [^A-Za-z0-9_]

\s

[ \t\n\r\f]

\S = non-whitespace

POSIX Classes (grep -E)

Class Matches

Class	Matches
`[:alpha:]`	Letters (a-zA-Z)
`[:digit:]`	Digits (0-9)
`[:alnum:]`	Letters and digits
`[:xdigit:]`	Hex digits (0-9A-Fa-f)
`[:space:]`	Whitespace
`[:lower:]`	Lowercase letters
`[:upper:]`	Uppercase letters

[:alpha:]

Letters (a-zA-Z)

[:digit:]

Digits (0-9)

[:alnum:]

Letters and digits

[:xdigit:]

Hex digits (0-9A-Fa-f)

[:space:]

Whitespace

[:lower:]

Lowercase letters

[:upper:]

Uppercase letters

POSIX classes go inside brackets:

Interactive CLI Drill

bash ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/examples/regex-drills/02-character-classes.sh

Exercise Set 1: Basic Classes

cat << 'EOF' > /tmp/ex-cc.txt
192.168.1.100
AA:BB:CC:DD:EE:FF
14:f6:d8:7b:31:80
Hello World
HELLO WORLD
hello world
user_name_123
port8080
2026-03-15
VLAN 100
EOF

Ex 1.1: Extract all digits

Solution

grep -Eo '[0-9]+' /tmp/ex-cc.txt

Output: 192, 168, 1, 100, 14, 6, 8, 7, 31, 80, …

Ex 1.2: Extract only lowercase words

Solution

grep -Eo '[a-z]+' /tmp/ex-cc.txt

Output: f, d, b, ello, orld, hello, world, user, name, port

Ex 1.3: Extract hex characters (for MACs)

Solution

grep -Eio '[A-Fa-f0-9]+' /tmp/ex-cc.txt

Ex 1.4: Extract uppercase words only

Solution

grep -Eo '[A-Z]+' /tmp/ex-cc.txt

Output: AA, BB, CC, DD, EE, FF, HELLO, WORLD, VLAN

Exercise Set 2: Negated Classes

Ex 2.1: Everything except digits

Solution

grep -Eo '[^0-9]+' /tmp/ex-cc.txt | head -10

This extracts non-numeric parts: dots, colons, letters, etc.

Ex 2.2: Consonants only (not vowels)

Solution

echo "Hello World" | grep -Eo '[^aeiouAEIOU ]+'

Output: Hll, Wrld

Ex 2.3: Non-whitespace tokens

Solution

echo "  word1   word2   word3  " | grep -Po '\S+'

Output: word1, word2, word3

Exercise Set 3: Combining Classes

Ex 3.1: MAC address octets

Solution

# Two hex chars followed by colon
grep -Eio '[0-9A-F]{2}:' /tmp/ex-cc.txt | head -10

Ex 3.2: Snake_case identifiers

Solution

grep -Eo '[a-z]+_[a-z]+_[0-9]+' /tmp/ex-cc.txt

Output: user_name_123

Ex 3.3: Port numbers attached to words

Solution

grep -Eo '[a-z]+[0-9]+' /tmp/ex-cc.txt

Output: f6, d8, b31, port8080 (includes MAC parts too)

Ex 3.4: Just "port" followed by number

Solution

grep -Eo 'port[0-9]+' /tmp/ex-cc.txt

Output: port8080

Exercise Set 4: PCRE Shorthand

Ex 4.1: Extract all numbers with \d

Solution

grep -Po '\d+' /tmp/ex-cc.txt

Ex 4.2: Word characters with \w

Solution

grep -Po '\w+' /tmp/ex-cc.txt | head -10

Note: \w includes underscore, so user_name_123 is one match

Ex 4.3: Find lines with multiple spaces

Solution

echo "normal  double   triple" | grep -P '\s{2,}'

\s{2,} = two or more whitespace characters

Real-World Applications

Professional: Extract IPs from Logs

# Basic IP pattern
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/syslog

Professional: Extract MAC Addresses

# Full MAC address (colon format)
grep -Eio '[0-9A-F]{2}(:[0-9A-F]{2}){5}' /var/log/ise-psc.log

Professional: VLAN Numbers

# VLAN followed by number
grep -Poi 'vlan\s*\d+' /etc/network/interfaces

Personal: Extract Dates

# ISO dates from notes
grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}' ~/journal/*.md

Personal: Find Phone Numbers

# US phone number patterns
grep -Eo '[0-9]{3}-[0-9]{3}-[0-9]{4}' ~/contacts.txt

Tool Variants

sed: Character Class Substitution

# Remove all digits
echo "port8080" | sed 's/[0-9]//g'
# Output: port

# Remove all non-letters
echo "user_123_name" | sed 's/[^a-zA-Z]//g'
# Output: username

awk: Pattern Matching

# Print lines containing hex characters
awk '/[A-Fa-f]/' /tmp/ex-cc.txt

# Extract field with digits only
echo "name:123:value" | awk -F: '$2 ~ /^[0-9]+$/ {print $2}'

vim: Character Class Search

" Find all hex sequences
/[0-9A-Fa-f]\+

" Find non-alphanumeric
/[^A-Za-z0-9]

" Delete trailing whitespace
:%s/[[:space:]]\+$//g

Python: Character Classes

import re

text = "Server IP: 192.168.1.100, MAC: AA:BB:CC:DD:EE:FF"

# Extract all numbers
numbers = re.findall(r'[0-9]+', text)
print(numbers)  # ['192', '168', '1', '100']

# Extract hex octets
hex_octets = re.findall(r'[A-Fa-f0-9]{2}', text)
print(hex_octets)  # ['19', '16', '10', 'AA', 'BB', ...]

Gotchas

Dash in Character Class

# WRONG: dash creates range
[a-z0-9-]  # This is confusing

# CORRECT: put dash first or last
[-a-z0-9]  # Dash as literal
[a-z0-9-]  # Dash at end (also works)

Caret Position

# ^ at START negates the class
[^abc]  # NOT a, b, or c

# ^ elsewhere is literal
[a^bc]  # Matches a, ^, b, or c

POSIX Double Brackets

# WRONG
grep -E '[:digit:]' file  # Matches : d i g t only!

# CORRECT
grep -E '[[:digit:]]' file  # Matches digits

Key Takeaways

Pattern Use Case

Pattern	Use Case
`[0-9]`	Single digit
`[0-9]+`	One or more digits (a number)
`[a-zA-Z]`	Any letter
`[A-Fa-f0-9]`	Hex character
`[^abc]`	Anything except a, b, c
`\d` (PCRE)	Shorthand for `[0-9]`
`\w` (PCRE)	Word character (letters, digits, underscore)

[0-9]

Single digit

[0-9]+

One or more digits (a number)

[a-zA-Z]

Any letter

[A-Fa-f0-9]

Hex character

[^abc]

Anything except a, b, c

\d (PCRE)

Shorthand for [0-9]

\w (PCRE)

Word character (letters, digits, underscore)

Self-Test

What does [^0-9] match?
What’s the PCRE shorthand for [0-9]?
How do you match a literal dash in a character class?
What’s wrong with [:alpha:] in grep?
Does [A-z] include only letters?

Answers

Any character that is NOT a digit
\d
Put it first or last: [-abc] or [abc-]
Missing outer brackets - should be
No! It includes [ \ ] ^ _ ` too (characters between Z and a in ASCII)

Next Drill

Drill 03: Quantifiers - Master *, +, ?, {n,m}, and greedy vs lazy.