Drill 02: Character Classes

Character classes let you match any character from a defined set. This is how you match "any digit" or "any letter" without listing every possibility.

Core Concepts

Syntax Meaning Example

[abc]

Match a, b, OR c

[aeiou] = any vowel

[a-z]

Range: a through z

[A-Z] = uppercase

[0-9]

Range: any digit

[0-9]+ = number

[^abc]

NOT a, b, or c

[^0-9] = non-digit

[a-zA-Z]

Combined ranges

Any letter

PCRE Shorthand (grep -P)

Shorthand Equivalent Negated

\d

[0-9]

\D = [^0-9]

\w

[A-Za-z0-9_]

\W = [^A-Za-z0-9_]

\s

[ \t\n\r\f]

\S = non-whitespace

POSIX Classes (grep -E)

Class Matches

[:alpha:]

Letters (a-zA-Z)

[:digit:]

Digits (0-9)

[:alnum:]

Letters and digits

[:xdigit:]

Hex digits (0-9A-Fa-f)

[:space:]

Whitespace

[:lower:]

Lowercase letters

[:upper:]

Uppercase letters

POSIX classes go inside brackets:

Interactive CLI Drill

bash ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/examples/regex-drills/02-character-classes.sh

Exercise Set 1: Basic Classes

cat << 'EOF' > /tmp/ex-cc.txt
192.168.1.100
AA:BB:CC:DD:EE:FF
14:f6:d8:7b:31:80
Hello World
HELLO WORLD
hello world
user_name_123
port8080
2026-03-15
VLAN 100
EOF

Ex 1.1: Extract all digits

Solution
grep -Eo '[0-9]+' /tmp/ex-cc.txt

Output: 192, 168, 1, 100, 14, 6, 8, 7, 31, 80, …​

Ex 1.2: Extract only lowercase words

Solution
grep -Eo '[a-z]+' /tmp/ex-cc.txt

Output: f, d, b, ello, orld, hello, world, user, name, port

Ex 1.3: Extract hex characters (for MACs)

Solution
grep -Eio '[A-Fa-f0-9]+' /tmp/ex-cc.txt

Ex 1.4: Extract uppercase words only

Solution
grep -Eo '[A-Z]+' /tmp/ex-cc.txt

Output: AA, BB, CC, DD, EE, FF, HELLO, WORLD, VLAN

Exercise Set 2: Negated Classes

Ex 2.1: Everything except digits

Solution
grep -Eo '[^0-9]+' /tmp/ex-cc.txt | head -10

This extracts non-numeric parts: dots, colons, letters, etc.

Ex 2.2: Consonants only (not vowels)

Solution
echo "Hello World" | grep -Eo '[^aeiouAEIOU ]+'

Output: Hll, Wrld

Ex 2.3: Non-whitespace tokens

Solution
echo "  word1   word2   word3  " | grep -Po '\S+'

Output: word1, word2, word3

Exercise Set 3: Combining Classes

Ex 3.1: MAC address octets

Solution
# Two hex chars followed by colon
grep -Eio '[0-9A-F]{2}:' /tmp/ex-cc.txt | head -10

Ex 3.2: Snake_case identifiers

Solution
grep -Eo '[a-z]+_[a-z]+_[0-9]+' /tmp/ex-cc.txt

Output: user_name_123

Ex 3.3: Port numbers attached to words

Solution
grep -Eo '[a-z]+[0-9]+' /tmp/ex-cc.txt

Output: f6, d8, b31, port8080 (includes MAC parts too)

Ex 3.4: Just "port" followed by number

Solution
grep -Eo 'port[0-9]+' /tmp/ex-cc.txt

Output: port8080

Exercise Set 4: PCRE Shorthand

Ex 4.1: Extract all numbers with \d

Solution
grep -Po '\d+' /tmp/ex-cc.txt

Ex 4.2: Word characters with \w

Solution
grep -Po '\w+' /tmp/ex-cc.txt | head -10

Note: \w includes underscore, so user_name_123 is one match

Ex 4.3: Find lines with multiple spaces

Solution
echo "normal  double   triple" | grep -P '\s{2,}'

\s{2,} = two or more whitespace characters

Real-World Applications

Professional: Extract IPs from Logs

# Basic IP pattern
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' /var/log/syslog

Professional: Extract MAC Addresses

# Full MAC address (colon format)
grep -Eio '[0-9A-F]{2}(:[0-9A-F]{2}){5}' /var/log/ise-psc.log

Professional: VLAN Numbers

# VLAN followed by number
grep -Poi 'vlan\s*\d+' /etc/network/interfaces

Personal: Extract Dates

# ISO dates from notes
grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}' ~/journal/*.md

Personal: Find Phone Numbers

# US phone number patterns
grep -Eo '[0-9]{3}-[0-9]{3}-[0-9]{4}' ~/contacts.txt

Tool Variants

sed: Character Class Substitution

# Remove all digits
echo "port8080" | sed 's/[0-9]//g'
# Output: port

# Remove all non-letters
echo "user_123_name" | sed 's/[^a-zA-Z]//g'
# Output: username

awk: Pattern Matching

# Print lines containing hex characters
awk '/[A-Fa-f]/' /tmp/ex-cc.txt

# Extract field with digits only
echo "name:123:value" | awk -F: '$2 ~ /^[0-9]+$/ {print $2}'
" Find all hex sequences
/[0-9A-Fa-f]\+

" Find non-alphanumeric
/[^A-Za-z0-9]

" Delete trailing whitespace
:%s/[[:space:]]\+$//g

Python: Character Classes

import re

text = "Server IP: 192.168.1.100, MAC: AA:BB:CC:DD:EE:FF"

# Extract all numbers
numbers = re.findall(r'[0-9]+', text)
print(numbers)  # ['192', '168', '1', '100']

# Extract hex octets
hex_octets = re.findall(r'[A-Fa-f0-9]{2}', text)
print(hex_octets)  # ['19', '16', '10', 'AA', 'BB', ...]

Gotchas

Dash in Character Class

# WRONG: dash creates range
[a-z0-9-]  # This is confusing

# CORRECT: put dash first or last
[-a-z0-9]  # Dash as literal
[a-z0-9-]  # Dash at end (also works)

Caret Position

# ^ at START negates the class
[^abc]  # NOT a, b, or c

# ^ elsewhere is literal
[a^bc]  # Matches a, ^, b, or c

POSIX Double Brackets

# WRONG
grep -E '[:digit:]' file  # Matches : d i g t only!

# CORRECT
grep -E '[[:digit:]]' file  # Matches digits

Key Takeaways

Pattern Use Case

[0-9]

Single digit

[0-9]+

One or more digits (a number)

[a-zA-Z]

Any letter

[A-Fa-f0-9]

Hex character

[^abc]

Anything except a, b, c

\d (PCRE)

Shorthand for [0-9]

\w (PCRE)

Word character (letters, digits, underscore)

Self-Test

  1. What does [^0-9] match?

  2. What’s the PCRE shorthand for [0-9]?

  3. How do you match a literal dash in a character class?

  4. What’s wrong with [:alpha:] in grep?

  5. Does [A-z] include only letters?

Answers
  1. Any character that is NOT a digit

  2. \d

  3. Put it first or last: [-abc] or [abc-]

  4. Missing outer brackets - should be

  5. No! It includes [ \ ] ^ _ ` too (characters between Z and a in ASCII)

Next Drill

Drill 03: Quantifiers - Master *, +, ?, {n,m}, and greedy vs lazy.