rg — Optimization

Performance and Optimization

rg is already fast by default — Rust regex engine, SIMD acceleration, parallel directory traversal, memory-mapped files. These entries cover the knobs you can turn when searching massive trees or when every millisecond matters in a pipeline.

--threads for controlling parallelism

rg --threads 1 'include::' ~/atelier/_bibliotheca/domus-captures/docs/ > /dev/null 2>&1
rg --threads 4 'include::' ~/atelier/_bibliotheca/domus-captures/docs/ > /dev/null 2>&1
rg --threads 8 'include::' ~/atelier/_bibliotheca/domus-captures/docs/ > /dev/null 2>&1

Default thread count equals logical CPU cores. --threads 1 forces single-threaded search — useful for deterministic output ordering (results may interleave with multiple threads). Increase thread count on high-core machines searching network mounts where I/O latency dominates.

--mmap for memory-mapped file I/O

cat <<'EOF' > /tmp/rg-mmap-test.txt
Line 1: sample data for mmap testing
Line 2: more sample data for mmap testing
Line 3: final line of sample data
EOF
rg --mmap 'sample' /tmp/rg-mmap-test.txt

--mmap uses memory-mapped I/O instead of read syscalls. Faster for large individual files where the OS page cache is warm. Slower for many small files (mmap setup overhead). rg auto-selects the best strategy by default — use --mmap or --no-mmap only when benchmarking or when you know the access pattern.

Benchmark rg vs grep — measure the actual difference

time rg -c 'include::' ~/atelier/_bibliotheca/domus-captures/docs/ > /dev/null 2>&1
time grep -rc 'include::' ~/atelier/_bibliotheca/domus-captures/docs/ > /dev/null 2>&1

time measures wall clock, user CPU, and system CPU. On warm cache, rg is typically 2-5x faster than grep for small trees and 10-50x faster for large codebases. The gap widens with regex complexity — rg’s finite automaton avoids grep’s backtracking on patterns like .*foo.*bar.

--no-unicode for ASCII-only speedup

rg --no-unicode '[a-z]+' ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/partials/codex/rg/basics.adoc > /dev/null 2>&1

By default, [a-z] in rg matches Unicode lowercase letters (accented characters, etc.). --no-unicode restricts character classes to ASCII, which uses simpler DFA construction and faster matching. Use only when you know the content is ASCII — AsciiDoc text with Unicode characters will miss non-ASCII matches.

--max-count (-m) to stop after N matches per file

rg -m 1 'include::' ~/atelier/_bibliotheca/domus-captures/docs/modules/ROOT/pages/codex/rg/

-m 1 returns at most 1 match per file. rg stops reading the file after finding the first match — on large files, this avoids scanning megabytes of trailing content. Combine with -l (files-only) for existence checks: rg -l implies -m 1 internally.

--max-filesize to skip huge files

rg --max-filesize 100K 'pattern' ~/atelier/_bibliotheca/domus-captures/docs/ 2>/dev/null | head -5 || echo "No matches in files under 100K"

Files exceeding the size limit are silently skipped. Useful when a repo contains large generated files (SQL dumps, compiled assets, vendor bundles) that slow down searches and produce noisy results. Accepts suffixes: K, M, G.

Search only tracked git files — rg with git ls-files

git -C ~/atelier/_bibliotheca/domus-captures ls-files -z '*.adoc' | xargs -0 rg -l 'include::partial' 2>/dev/null | head -10

rg respects .gitignore by default, but git ls-files is stricter — it only lists files actually tracked by git. Untracked files (new, not yet added) are excluded. -z and -0 use null delimiters for paths with spaces. This is the safest approach for CI pipelines.

--stats for profiling — timing and match counts

rg --stats 'include::' -g '*.adoc' ~/atelier/_bibliotheca/domus-captures/docs/ 2>&1 | tail -8

Expected output includes:

N matches
N matched lines
N files contained matches
N files searched
N bytes printed
N bytes searched
M.NNN seconds spent searching

--stats appends to stderr. 2>&1 | tail -8 captures the summary. Use this to benchmark different patterns, glob filters, or thread counts. The "seconds spent searching" is wall-clock time for the search phase, excluding startup.