Don Quijote — CLI Literary Analysis
CLI commands for analyzing Cervantes' Don Quijote de la Mancha directly from the terminal. The full corpus (126 chapters across Primera and Segunda Parte) serves as both literary study material and CLI training data.
Every drill hits two targets: CLI mastery and Spanish comprehension.
QJ="docs/modules/ROOT/pages/education/literature/quijote"
Drill Sets
| Category | Focus |
|---|---|
Vocabulary density per chapter, hapax legomena, archaic forms, character dominance, dialogue density, content word frequency |
|
Biblical, classical, chivalric, pastoral, and metafictional references — heat maps and tradition tracking |
|
Vocabulary hunting, context windows, rhetorical pattern matching, comparative analysis, sonnet cross-referencing |
CLI × Don Quijote — Literary Analysis from the Terminal
The entire Primera Parte (52 chapters) and Segunda Parte are under:
QJ="docs/modules/ROOT/pages/education/literature/quijote"
Set this variable before running any drill.
Vocabulary Density — Which Chapters Are Richest?
Unique words / total words = vocabulary density. Higher = Cervantes used more diverse language.
for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
ch=$(basename "$f" .adoc | sed 's/texto-//')
total=$(awk '{for(i=1;i<=NF;i++) n++} END{print n}' "$f")
unique=$(awk '{for(i=1;i<=NF;i++) w[tolower($i)]++} END{print length(w)}' "$f")
ratio=$(awk "BEGIN{printf \"%.2f\", $unique/$total * 100}")
printf "Cap %s: %4d unique / %5d total = %s%% density\n" "$ch" "$unique" "$total" "$ratio"
done | sort -t= -k2 -rn | head -10
Cap 011: 976 unique / 2175 total = 44.87% density ← Golden Age speech Cap 050: 1084 unique / 2482 total = 43.67% density Cap 038: 716 unique / 1640 total = 43.66% density ← Arms and Letters Cap 001: 823 unique / 1917 total = 42.93% density ← Opening chapter
Chapter 11 (the Golden Age speech) has the highest vocabulary density in the entire Primera Parte. Cervantes pulled out his full lexicon for that passage.
Hapax Legomena — Words Used Exactly Once
A hapax legomenon is a word that appears only once in the entire corpus. These are Cervantes' rarest choices.
awk '{for(i=1;i<=NF;i++) {
w=tolower($i); gsub(/[.,;:!?¡¿«»""()—]/, "", w)
if(w!="") {freq[w]++; file[w]=FILENAME}
}} END {for(w in freq) if(freq[w]==1) print w, file[w]}' \
"$QJ"/primera-parte/texto/texto-*.adoc \
| awk '{split($2,a,"/"); split(a[length(a)],b,"."); print b[1], $1}' \
| sort | head -20
The Crucible — Metallurgical Vocabulary
Cervantes uses goldsmith language as metaphor: quilatar (to assay), crisolar/acrisolar (to purify in a crucible), quilates (carats). Trace this imagery across the entire work.
grep -rn -P 'quilat|crisol|acrisol' "$QJ"/ | \
awk -F: '{
split($1,a,"/")
parte = (a[length(a)-2] == "primera-parte") ? "I" : "II"
ch = a[length(a)]
gsub(/texto-0?/, "Cap ", ch)
gsub(/\.adoc/, "", ch)
text = substr($0, index($0,$3))
gsub(/^[ \t]+/, "", text)
printf "Parte %-2s %-8s L%-4s %s\n", parte, ch, $2, text
}'
Parte I Cap 33 L121 verdad, si no es probándola de manera que la prueba manifieste los quilates Parte I Cap 33 L134 deseo que Camila, mi esposa, pase por estas dificultades y se acrisole y Parte I Cap 33 L135 quilate en el fuego de verse requerida y solicitada Parte I Cap 43 L87 que la que se quilata por su gusto; Parte II Cap 10 L294 yo vi su fealdad, sino su hermosura, a la cual subía de punto y quilates
All concentrated in the Novela del curioso impertinente (Cap 33) — Anselmo testing Camila’s virtue like gold in fire.
Archaic vs Modern — Cervantes' Time Capsule
Words that were already archaic in Cervantes' time, used deliberately for effect.
grep -rohn -P '\b(fermosa|vuestra merced|facienda|agora|mesmo|ansí|aqueste|aquesa|desaguisado|follón|malandrín|fecho|maguer|vegada|yantar)\b' \
"$QJ"/primera-parte/texto/ \
| awk -F: '{freq[$2]++}
END {for(w in freq) printf "%4d %s\n", freq[w], w}' \
| sort -rn
294 vuestra merced 147 mesmo 89 vos 49 ansí 47 agora 23 fecho 19 fermosa ← deliberately archaic for "hermosa" 8 desaguisado 4 follón 2 malandrín
Character Dominance — Who Owns Each Chapter?
for f in "$QJ"/primera-parte/texto/texto-{001..020}.adoc; do
ch=$(basename "$f" .adoc | sed 's/texto-/Cap /')
dq=$(grep -oi 'don quijote\|quijote\|caballero' "$f" | wc -l)
sp=$(grep -oi 'sancho\|panza\|escudero' "$f" | wc -l)
dl=$(grep -oi 'dulcinea\|toboso\|señora' "$f" | wc -l)
printf "%-7s Quijote: %-3d Sancho: %-3d Dulcinea: %-3d " "$ch" "$dq" "$sp" "$dl"
if [ "$dq" -gt "$sp" ] && [ "$dq" -gt "$dl" ]; then echo "→ Don Quijote"
elif [ "$sp" -gt "$dq" ]; then echo "→ Sancho"
elif [ "$dl" -gt "$dq" ]; then echo "→ Dulcinea"
else echo "→ balanced"; fi
done
Cap 001 Quijote: 20 Sancho: 0 Dulcinea: 7 → Don Quijote Cap 007 Quijote: 31 Sancho: 22 Dulcinea: 0 → Don Quijote Cap 015 Quijote: 43 Sancho: 44 Dulcinea: 3 → Sancho ← first Sancho chapter
Dialogue Density — Visual Bar Chart
for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
ch=$(basename "$f" .adoc | sed 's/texto-0*//')
total=$(grep -c '.' "$f")
dialogue=$(grep -c '^—' "$f")
pct=$(awk "BEGIN{printf \"%.0f\", ($dialogue/$total)*100}")
bar=$(printf '%*s' "$((pct/2))" '' | tr ' ' '█')
printf "Ch %-3s %3d%% %s\n" "$ch" "$pct" "$bar"
done
Chapters 33-34 (Novela del curioso impertinente) show 0% dialogue — they’re pure narrative prose. Chapter 6 (the book burning) is the most dialogue-heavy.
Cervantes' Breath — Longest Paragraphs
Reconstruct paragraphs from line-wrapped text, then measure.
awk '
/^[=:\/\[]/ {next}
/^$/ {
if (length(para) > 0) {
split(FILENAME, a, "/")
printf "%4d chars | %-15s L%-4d | %.85s...\n", length(para), a[length(a)], start, para
para = ""
}
next
}
{ if (length(para) == 0) start = NR; para = para " " $0 }
' "$QJ"/primera-parte/texto/texto-*.adoc | sort -rn | head -5
6814 chars | texto-038.adoc ← Arms and Letters speech — Cervantes' longest single breath 6652 chars | texto-048.adoc ← The Canon's literary criticism 6616 chars | texto-050.adoc ← Don Quijote's defense of chivalric romances
Words Unique to the Golden Age Speech (Process Substitution)
comm -23 finds words in file A that are NOT in file B. Process substitution lets us compare without temp files.
comm -23 \
<(awk '{for(i=1;i<=NF;i++){w=tolower($i); gsub(/[.,;:!?¡¿«»""()—\[\]]/,"",w); if(length(w)>3) print w}}' \
"$QJ"/primera-parte/texto/texto-011.adoc | sort -u) \
<(awk '{for(i=1;i<=NF;i++){w=tolower($i); gsub(/[.,;:!?¡¿«»""()—\[\]]/,"",w); if(length(w)>3) print w}}' \
"$QJ"/primera-parte/texto/texto-0{01..10}.adoc "$QJ"/primera-parte/texto/texto-0{12..52}.adoc 2>/dev/null | sort -u) \
| head -20
abejas ← bees (the "solícitas y discretas abejas") alcornoques ← cork oaks barraganía ← concubinage (archaic legal term) bellotas ← acorns zagalejas ← shepherdesses (diminutive)
These are pastoral vocabulary — Cervantes shifted register entirely for this speech.
Cervantes' Favorite Content Words
Filter out articles, prepositions, and common verbs to find what Cervantes actually talks about.
awk '{for(i=1;i<=NF;i++){
w=tolower($i)
gsub(/[.,;:!?¡¿«»""()—\[\]]/,"",w)
if(length(w)>4 && w!~/^(sobre|donde|entre|porque|también|aunque|antes|después|desde|cuando|todos|todas|tiene|tenía|hacer|había|hasta|siendo|tanto|mucho|puede|decir|dijo|aquella|aquel|estas|estos|mismo|otros|otras|podía|quien|algunos|como)$/)
freq[w]++
}} END {for(w in freq) printf "%5d %s\n", freq[w], w}' \
"$QJ"/primera-parte/texto/texto-*.adoc | sort -rn | head -15
925 quijote 652 sancho 432 respondió ← "replied" — dialogue drives the novel 428 vuestra ← formal address 396 señor ← lord/sir 336 caballero ← knight 228 señora ← lady 206 manera ← manner/way 195 verdad ← truth
CLI Skills Practiced
| Tool | Skill |
|---|---|
|
Format specifiers: |
|
PCRE for Spanish characters, alternation, word boundaries |
|
Recursive, only-matching, file:match, line numbers |
|
Frequency arrays, field processing, |
|
Arithmetic in |
|
Pattern extraction, substitution, line addressing |
|
Set difference between two sorted lists |
|
Process substitution — treat command output as file argument |
|
Field-specific sorting with custom delimiters |
|
Formatted output with alignment, padding |
|
Character translation for visual bar charts |
|
Per-item command execution from pipeline |
Intertextuality Drills
QJ="docs/modules/ROOT/pages/education/literature/quijote"
Drill 1 — Heat Map: Which chapter is the most intertextual?
for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
ch=$(basename "$f" .adoc | sed 's/texto-0*//')
bible=$(grep -ciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$f")
classical=$(grep -ciP 'aristóteles|platón|homero|virgilio|ovidio|horacio|cicerón|hércules|alejandro|apolo|febo|venus|marte|troya|ulises|eneas' "$f")
chivalric=$(grep -ciP 'amadís|palmerín|orlando|roldán|lanzarote|tristán|galaor|esplandián|belianís|ariosto' "$f")
pastoral=$(grep -ciP 'diana|montemayor|arcadia|égloga|pastor[ae]?s?\b|zagal' "$f")
meta=$(grep -ciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo|traductor' "$f")
total=$((bible + classical + chivalric + pastoral + meta))
[ "$total" -gt 0 ] && printf "Ch %-3s Bib:%-2d Clas:%-2d Chiv:%-2d Past:%-2d Meta:%-2d Total:%-3d\n" \
"$ch" "$bible" "$classical" "$chivalric" "$pastoral" "$meta" "$total"
done | sort -t: -k7 -rn | head -15
Drill 2 — Which tradition dominates each chapter?
for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
ch=$(basename "$f" .adoc | sed 's/texto-0*//')
bible=$(grep -ciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$f")
classical=$(grep -ciP 'aristóteles|platón|homero|virgilio|hércules|alejandro|apolo|febo' "$f")
chivalric=$(grep -ciP 'amadís|palmerín|orlando|roldán|lanzarote|belianís|ariosto' "$f")
pastoral=$(grep -ciP 'diana|montemayor|arcadia|égloga|pastor|zagal' "$f")
meta=$(grep -ciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo' "$f")
max=$bible; dom="Biblical"
[ "$classical" -gt "$max" ] && max=$classical && dom="Classical"
[ "$chivalric" -gt "$max" ] && max=$chivalric && dom="Chivalric"
[ "$pastoral" -gt "$max" ] && max=$pastoral && dom="Pastoral"
[ "$meta" -gt "$max" ] && max=$meta && dom="Metaliterary"
[ "$max" -gt 0 ] && printf "Ch %-3s → %-12s (%d)\n" "$ch" "$dom" "$max"
done
Drill 3 — Trace a specific author across the text
# Replace AUTHOR with: amadís, ariosto, homero, virgilio, etc.
AUTHOR="amadís"
grep -rn -i "$AUTHOR" "$QJ"/primera-parte/texto/ \
| awk -F: '{split($1,a,"/"); printf "%-15s L%-4s %s\n", a[length(a)], $2, substr($0,index($0,$3))}'
Drill 4 — Cross-tradition convergence points
for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
ch=$(basename "$f" .adoc | sed 's/texto-0*//')
t=0
[ $(grep -ciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$f") -gt 0 ] && ((t++))
[ $(grep -ciP 'aristóteles|platón|homero|virgilio|hércules|alejandro|apolo|febo' "$f") -gt 0 ] && ((t++))
[ $(grep -ciP 'amadís|palmerín|orlando|roldán|lanzarote|belianís|ariosto' "$f") -gt 0 ] && ((t++))
[ $(grep -ciP 'diana|montemayor|arcadia|égloga|pastor|zagal' "$f") -gt 0 ] && ((t++))
[ $(grep -ciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo' "$f") -gt 0 ] && ((t++))
[ "$t" -ge 3 ] && printf "Ch %-3s %d/5 traditions\n" "$ch" "$t"
done | sort -t/ -k2 -rn
Drill 5 — Compare Primera vs Segunda Parte intertextuality
for parte in primera-parte segunda-parte; do
echo "=== ${parte} ==="
bible=$(grep -rciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
classical=$(grep -rciP 'aristóteles|platón|homero|virgilio|hércules|alejandro|apolo' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
chivalric=$(grep -rciP 'amadís|palmerín|orlando|roldán|lanzarote|belianís|ariosto' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
pastoral=$(grep -rciP 'diana|montemayor|arcadia|égloga|pastor|zagal' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
meta=$(grep -rciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
printf " Biblical: %3d Classical: %3d Chivalric: %3d Pastoral: %3d Meta: %3d\n" \
"$bible" "$classical" "$chivalric" "$pastoral" "$meta"
done
CLI Skills Practiced
| Skill | Application |
|---|---|
|
Case-insensitive PCRE counting per file |
|
Multi-term search across corpus |
|
Parse grep output (file:line:text) |
Shell arithmetic |
Sum category scores |
Conditional |
Count boolean presence per tradition |
|
Sort by specific field with custom delimiter |
|
Formatted tabular output |
Process substitution |
Compare vocabulary sets between partes |
Corpus Analysis & Creative Cross-Reference
QJ="docs/modules/ROOT/pages/education/literature/quijote"
Set this variable before running any drill.
Vocabulary Hunting — Find Your Words in Cervantes
Check whether words from your own writing appear in the Quijote corpus.
grep -rn 'lumbre' "$QJ"/
grep -rl 'lumbre' "$QJ"/ | wc -l
grep -rc 'lumbre' "$QJ"/primera-parte/texto/ | awk -F: '$2>0'
grep -rc 'lumbre' "$QJ"/segunda-parte/texto/ | awk -F: '$2>0'
-rc combines recursive search with count per file. awk -F: '$2>0' filters to files with at least one match.
Context Windows — See the Word in Its Sentence
grep -rn -B2 -A2 'sepulcro' "$QJ"/
grep -rn -C3 'querella' "$QJ"/primera-parte/texto/
-B = before, -A = after, -C = both. These flags turn grep from a line finder into a passage finder.
Rhetorical Pattern Matching
grep -rn -P '\w+, \w+ y \w+' "$QJ"/primera-parte/texto/texto-033.adoc
grep -rn '¡' "$QJ"/primera-parte/texto/texto-034.adoc
grep -rn -P 'fuese|fuera|hubiese|hubiera|tuviese|tuviera' "$QJ"/primera-parte/texto/texto-033.adoc
Comparative Chapter Analysis
for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
count=$(grep -c '—' "$f")
printf "%-20s %s\n" "$(basename $f)" "$count"
done | sort -k2 -rn | head -10
texto-023.adoc 72 texto-031.adoc 68 texto-029.adoc 62
Chapters 33–34 (Novela del curioso impertinente) show near-zero dialogue — pure narrative prose. The dialogue-heavy chapters are the adventure and conversation episodes.
grep -ric 'honor\|honra' "$QJ"/primera-parte/texto/ | sort -t: -k2 -rn | head -10
for name in Sancho Dulcinea Rocinante Lotario Camila Anselmo; do
count=$(grep -ric "$name" "$QJ"/primera-parte/texto/ | awk -F: '{s+=$2} END {print s}')
printf "%-12s %s mentions\n" "$name" "$count"
done
Sancho 652 mentions Dulcinea 108 mentions Rocinante 118 mentions Lotario 92 mentions ← confined to chapters 33-35 Camila 85 mentions ← confined to chapters 33-35 Anselmo 74 mentions ← confined to chapters 33-35
Lotario, Camila, and Anselmo exist only inside the interpolated novel. They never touch the main narrative.
awk — Text Analysis Beyond grep
awk '{for(i=1;i<=NF;i++) w[tolower($i)]++} END {print length(w) " unique words"}' \
"$QJ"/primera-parte/texto/texto-033.adoc
awk 'length > 200 {printf "%s:%d (%d chars)\n", FILENAME, NR, length}' \
"$QJ"/primera-parte/texto/texto-033.adoc
awk '{total+=NF; lines++} END {printf "%.1f words/line\n", total/lines}' \
"$QJ"/primera-parte/texto/texto-034.adoc
awk '{for(i=1;i<=NF;i++) w[tolower($i)]++} END {for(k in w) print w[k], k}' \
"$QJ"/primera-parte/texto/texto-03{3,4,5}.adoc | sort -rn | head -20
Brace expansion {3,4,5} selects exactly those three files — no loop needed.
Sonnet Cross-Reference
Check every significant word from a sonnet against the Cervantes corpus.
for word in desdenes sepulcro lumbre querella suspiros sombras ingrata; do
count=$(grep -ric "$word" "$QJ"/ | awk -F: '{s+=$2} END {print s}')
printf "%-12s %s occurrences in Quijote\n" "$word" "$count"
done
The count tells you whether you’re in Cervantes' register or outside it. High count = common Cervantine vocabulary. Zero = your own invention or a word from a different tradition.
Archaic Forms — Golden Age Register
grep -rn -P '\w+lle\b' "$QJ"/primera-parte/texto/
grep -rohn -P '\b(fermosa|vuestra merced|facienda|agora|mesmo|ansí)\b' \
"$QJ"/primera-parte/texto/ | awk -F: '{freq[$2]++} END {for(w in freq) printf "%4d %s\n", freq[w], w}' | sort -rn
-o outputs only the matched text, not the whole line. Combined with awk frequency counting, this gives you a vocabulary census.
Hapax Legomena — Words Used Exactly Once
Cervantes' rarest vocabulary choices. Potential material for your own writing.
awk '{for(i=1;i<=NF;i++) {
w=tolower($i); gsub(/[.,;:!?¡¿«»""()—]/, "", w)
if(w!="") freq[w]++
}} END {for(w in freq) if(freq[w]==1) print w}' \
"$QJ"/primera-parte/texto/texto-*.adoc | sort | head -40
These are words Cervantes reached for once and never used again. Each one was a deliberate choice for a specific moment.
CLI Skills Practiced
| Tool | Skill |
|---|---|
|
Context windows around matches |
|
Recursive case-insensitive count per file |
|
PCRE regex for Spanish characters and alternation |
|
Field splitting on colon (grep output format) |
|
|
|
Formatted iteration over files and word lists |
brace expansion |
|
|
Field-specific numeric reverse sort |
|
Count pipeline results |