Regular Expressions

Regular expressions are pattern-matching sequences used to search, extract, validate, and transform text. This reference covers regex from fundamentals to advanced techniques, with emphasis on practical infrastructure applications.

Why Regex Matters

Regex appears in virtually every technical domain:

Domain Applications

Log Analysis

Extract timestamps, IPs, error codes, usernames from logs

Configuration Management

Validate syntax, find misconfiguration, audit settings

Security

IOC extraction, threat hunting, SIEM queries, compliance audits

Network Engineering

Parse configs, extract IPs/MACs/VLANs, validate addressing

Automation

Data extraction in scripts, API response parsing, file processing

Development

Input validation, search/replace, code refactoring

Data Processing

ETL pipelines, data cleaning, format conversion

Curriculum

Module Description Status

1. Fundamentals

Literal characters, metacharacters, escaping, basic matching

Core

2. Character Classes

Sets, ranges, negation, shorthand classes (\d, \w, \s)

Core

3. Quantifiers

Repetition operators, greedy vs lazy matching, possessive quantifiers

Core

4. Anchors & Boundaries

Position matching, word boundaries, line/string anchors

Core

5. Groups & Capturing

Capturing groups, backreferences, non-capturing groups, named groups

Intermediate

6. Alternation & Conditionals

OR logic, conditional patterns, branch reset

Intermediate

7. Lookahead & Lookbehind

Zero-width assertions, positive/negative lookaround

Advanced

8. Regex Flavors

BRE, ERE, PCRE, JavaScript, Python, Vim differences

Reference

9. Infrastructure Patterns

Production-ready patterns for IPs, MACs, logs, configs

Reference

10. Tool Integration

grep, sed, awk, ripgrep, Python, JavaScript usage

Reference

Quick Reference

Essential Metacharacters

Character Meaning Example

.

Any character except newline

a.c matches "abc", "aXc"

\

Escape metacharacter

\. matches literal "."

^

Start of line

^Error matches "Error" at line start

$

End of line

\.log$ matches lines ending in ".log"

|

Alternation (OR)

cat|dog matches "cat" or "dog"

()

Grouping

(ab)+ matches "ab", "abab"

[]

Character class

[aeiou] matches any vowel

Quantifiers Summary

Greedy Lazy Meaning

*

*?

Zero or more

+

+?

One or more

?

??

Zero or one

{n}

{n}?

Exactly n

\{n,}

\{n,}?

n or more

\{n,m}

\{n,m}?

Between n and m

Shorthand Classes

Shorthand Equivalent Meaning

\d

[0-9]

Digit

\D

[^0-9]

Non-digit

\w

[A-Za-z0-9_]

Word character

\W

[^A-Za-z0-9_]

Non-word character

\s

[ \t\n\r\f]

Whitespace

\S

[^ \t\n\r\f]

Non-whitespace

Learning Path

Week 1: Foundations

  1. Complete Fundamentals

  2. Complete Character Classes

  3. Practice with grep on log files

Week 2: Core Patterns

  1. Master Quantifiers

  2. Learn Anchors

  3. Build first infrastructure patterns

Week 3: Intermediate

  1. Study Groups & Capturing

  2. Apply Alternation

  3. Practice extraction scripts

Week 4: Advanced

  1. Master Lookaround

  2. Study Flavor Differences

  3. Build pattern library