RCA-2026-04-11-001: Analysis

Root Cause

5 Whys Analysis

Why # Question and Answer

1

Why did the Antora build hang?
Because: ~5,000 scripture pages were added to a single Antora component. The node process exhausted memory processing all pages before writing output.

2

Why were 5,000 pages added to domus-captures?
Because: The generation script targeted domus-captures directly instead of a dedicated spoke repo. The decision to create a spoke was made during planning but not executed before generating files.

3

Why wasn’t the build tested before committing 5,000 files?
Because: Build verification was skipped. The generation script ran successfully and files were committed and pushed without running make first.

4

Why was build output being filtered instead of read?
Because: The pattern make 2>&1 | grep "Site built" was used repeatedly, which hides all errors and warnings. When the build hung, the grep waited forever with no indication of failure.

5

Why were pre-existing errors tolerated for hours?
Because: ~100 broken case-study xrefs were visible from the first build but dismissed as "pre-existing" and filtered out. This normalized error-blindness — when new errors appeared, they were filtered too.

Root Cause Statement

Antora cannot process ~5,000 pages in a single component within Cloudflare’s 20-minute build limit or reasonable local memory constraints. The files should have been generated into a dedicated spoke repo (domus-literature) from the start, and the build should have been tested before committing.

Contributing Factors

Factor Description Preventable?

No build test before commit

6,795 files committed and pushed without running make

Yes

Grepped build output

make 2>&1 | grep "Site built" hid all errors/warnings

Yes

Amend after push

git commit --amend on a pushed commit rewrote history, causing divergence

Yes

Duplicate agent work

Background agent and main conversation both fixed the same files, creating duplicate commits

Yes

Normalized error-blindness

100+ pre-existing xref errors were visible in every build but ignored for hours

Yes

No Antora page limit awareness

No documented threshold for how many pages a single Antora component can handle

Partially — now documented

Cloudflare free tier limit

20-minute build timeout is a hard constraint not previously tested against

No — external constraint

Impact

Metric Value

Severity

P2 — Build broken, deployment blocked

Duration

~4 hours

Systems Affected

domus-captures build (local + Cloudflare), git branch integrity

Commits Requiring Recovery

3 (1 amend, 1 duplicate, 1 rebase)

Files Touched in Recovery

110+ (xref fixes) + 4,899 (scripture removal) + 13 (attribute escapes)

Data Loss

None — scripture content preserved in git history and Principia source