Don Quijote — CLI Literary Analysis

CLI commands for analyzing Cervantes' Don Quijote de la Mancha directly from the terminal. The full corpus (126 chapters across Primera and Segunda Parte) serves as both literary study material and CLI training data.

Every drill hits two targets: CLI mastery and Spanish comprehension.

QJ="docs/modules/ROOT/pages/education/literature/quijote"

Drill Sets

Category	Focus
Vocabulary Drills	Vocabulary density per chapter, hapax legomena, archaic forms, character dominance, dialogue density, content word frequency
Intertextuality Drills	Biblical, classical, chivalric, pastoral, and metafictional references — heat maps and tradition tracking
Corpus Analysis & Creative Cross-Reference	Vocabulary hunting, context windows, rhetorical pattern matching, comparative analysis, sonnet cross-referencing

Vocabulary Drills

CLI × Don Quijote — Literary Analysis from the Terminal

The entire Primera Parte (52 chapters) and Segunda Parte are under:

QJ="docs/modules/ROOT/pages/education/literature/quijote"

Set this variable before running any drill.

Vocabulary Density — Which Chapters Are Richest?

Unique words / total words = vocabulary density. Higher = Cervantes used more diverse language.

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-//')
  total=$(awk '{for(i=1;i<=NF;i++) n++} END{print n}' "$f")
  unique=$(awk '{for(i=1;i<=NF;i++) w[tolower($i)]++} END{print length(w)}' "$f")
  ratio=$(awk "BEGIN{printf \"%.2f\", $unique/$total * 100}")
  printf "Cap %s: %4d unique / %5d total = %s%% density\n" "$ch" "$unique" "$total" "$ratio"
done | sort -t= -k2 -rn | head -10

Tested output

Cap 011:  976 unique /  2175 total = 44.87% density  ← Golden Age speech
Cap 050: 1084 unique /  2482 total = 43.67% density
Cap 038:  716 unique /  1640 total = 43.66% density  ← Arms and Letters
Cap 001:  823 unique /  1917 total = 42.93% density  ← Opening chapter

Chapter 11 (the Golden Age speech) has the highest vocabulary density in the entire Primera Parte. Cervantes pulled out his full lexicon for that passage.

Hapax Legomena — Words Used Exactly Once

A hapax legomenon is a word that appears only once in the entire corpus. These are Cervantes' rarest choices.

awk '{for(i=1;i<=NF;i++) {
  w=tolower($i); gsub(/[.,;:!?¡¿«»""()—]/, "", w)
  if(w!="") {freq[w]++; file[w]=FILENAME}
}} END {for(w in freq) if(freq[w]==1) print w, file[w]}' \
  "$QJ"/primera-parte/texto/texto-*.adoc \
  | awk '{split($2,a,"/"); split(a[length(a)],b,"."); print b[1], $1}' \
  | sort | head -20

The Crucible — Metallurgical Vocabulary

Cervantes uses goldsmith language as metaphor: quilatar (to assay), crisolar/acrisolar (to purify in a crucible), quilates (carats). Trace this imagery across the entire work.

grep -rn -P 'quilat|crisol|acrisol' "$QJ"/ | \
  awk -F: '{
    split($1,a,"/")
    parte = (a[length(a)-2] == "primera-parte") ? "I" : "II"
    ch = a[length(a)]
    gsub(/texto-0?/, "Cap ", ch)
    gsub(/\.adoc/, "", ch)
    text = substr($0, index($0,$3))
    gsub(/^[ \t]+/, "", text)
    printf "Parte %-2s %-8s L%-4s %s\n", parte, ch, $2, text
  }'

Tested output

Parte I  Cap 33   L121  verdad, si no es probándola de manera que la prueba manifieste los quilates
Parte I  Cap 33   L134  deseo que Camila, mi esposa, pase por estas dificultades y se acrisole y
Parte I  Cap 33   L135  quilate en el fuego de verse requerida y solicitada
Parte I  Cap 43   L87   que la que se quilata por su gusto;
Parte II Cap 10   L294  yo vi su fealdad, sino su hermosura, a la cual subía de punto y quilates

All concentrated in the Novela del curioso impertinente (Cap 33) — Anselmo testing Camila’s virtue like gold in fire.

Archaic vs Modern — Cervantes' Time Capsule

Words that were already archaic in Cervantes' time, used deliberately for effect.

grep -rohn -P '\b(fermosa|vuestra merced|facienda|agora|mesmo|ansí|aqueste|aquesa|desaguisado|follón|malandrín|fecho|maguer|vegada|yantar)\b' \
  "$QJ"/primera-parte/texto/ \
  | awk -F: '{freq[$2]++}
  END {for(w in freq) printf "%4d  %s\n", freq[w], w}' \
  | sort -rn

Tested output

 294  vuestra merced
 147  mesmo
  89  vos
  49  ansí
  47  agora
  23  fecho
  19  fermosa    ← deliberately archaic for "hermosa"
   8  desaguisado
   4  follón
   2  malandrín

Character Dominance — Who Owns Each Chapter?

for f in "$QJ"/primera-parte/texto/texto-{001..020}.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-/Cap /')
  dq=$(grep -oi 'don quijote\|quijote\|caballero' "$f" | wc -l)
  sp=$(grep -oi 'sancho\|panza\|escudero' "$f" | wc -l)
  dl=$(grep -oi 'dulcinea\|toboso\|señora' "$f" | wc -l)
  printf "%-7s  Quijote: %-3d  Sancho: %-3d  Dulcinea: %-3d  " "$ch" "$dq" "$sp" "$dl"
  if [ "$dq" -gt "$sp" ] && [ "$dq" -gt "$dl" ]; then echo "→ Don Quijote"
  elif [ "$sp" -gt "$dq" ]; then echo "→ Sancho"
  elif [ "$dl" -gt "$dq" ]; then echo "→ Dulcinea"
  else echo "→ balanced"; fi
done

Tested output — Sancho doesn’t appear until Chapter 7, doesn’t dominate until Chapter 15

Cap 001  Quijote: 20   Sancho: 0    Dulcinea: 7    → Don Quijote
Cap 007  Quijote: 31   Sancho: 22   Dulcinea: 0    → Don Quijote
Cap 015  Quijote: 43   Sancho: 44   Dulcinea: 3    → Sancho  ← first Sancho chapter

Dialogue Density — Visual Bar Chart

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-0*//')
  total=$(grep -c '.' "$f")
  dialogue=$(grep -c '^—' "$f")
  pct=$(awk "BEGIN{printf \"%.0f\", ($dialogue/$total)*100}")
  bar=$(printf '%*s' "$((pct/2))" '' | tr ' ' '█')
  printf "Ch %-3s %3d%% %s\n" "$ch" "$pct" "$bar"
done

Chapters 33-34 (Novela del curioso impertinente) show 0% dialogue — they’re pure narrative prose. Chapter 6 (the book burning) is the most dialogue-heavy.

Cervantes' Breath — Longest Paragraphs

Reconstruct paragraphs from line-wrapped text, then measure.

awk '
  /^[=:\/\[]/ {next}
  /^$/ {
    if (length(para) > 0) {
      split(FILENAME, a, "/")
      printf "%4d chars | %-15s L%-4d | %.85s...\n", length(para), a[length(a)], start, para
      para = ""
    }
    next
  }
  { if (length(para) == 0) start = NR; para = para " " $0 }
' "$QJ"/primera-parte/texto/texto-*.adoc | sort -rn | head -5

Tested output

6814 chars | texto-038.adoc   ← Arms and Letters speech — Cervantes' longest single breath
6652 chars | texto-048.adoc   ← The Canon's literary criticism
6616 chars | texto-050.adoc   ← Don Quijote's defense of chivalric romances

Words Unique to the Golden Age Speech (Process Substitution)

comm -23 finds words in file A that are NOT in file B. Process substitution lets us compare without temp files.

comm -23 \
  <(awk '{for(i=1;i<=NF;i++){w=tolower($i); gsub(/[.,;:!?¡¿«»""()—\[\]]/,"",w); if(length(w)>3) print w}}' \
    "$QJ"/primera-parte/texto/texto-011.adoc | sort -u) \
  <(awk '{for(i=1;i<=NF;i++){w=tolower($i); gsub(/[.,;:!?¡¿«»""()—\[\]]/,"",w); if(length(w)>3) print w}}' \
    "$QJ"/primera-parte/texto/texto-0{01..10}.adoc "$QJ"/primera-parte/texto/texto-0{12..52}.adoc 2>/dev/null | sort -u) \
  | head -20

Tested output — words Cervantes used ONLY in the Golden Age chapter

abejas          ← bees (the "solícitas y discretas abejas")
alcornoques     ← cork oaks
barraganía      ← concubinage (archaic legal term)
bellotas        ← acorns
zagalejas       ← shepherdesses (diminutive)

These are pastoral vocabulary — Cervantes shifted register entirely for this speech.

Cervantes' Favorite Content Words

Filter out articles, prepositions, and common verbs to find what Cervantes actually talks about.

awk '{for(i=1;i<=NF;i++){
  w=tolower($i)
  gsub(/[.,;:!?¡¿«»""()—\[\]]/,"",w)
  if(length(w)>4 && w!~/^(sobre|donde|entre|porque|también|aunque|antes|después|desde|cuando|todos|todas|tiene|tenía|hacer|había|hasta|siendo|tanto|mucho|puede|decir|dijo|aquella|aquel|estas|estos|mismo|otros|otras|podía|quien|algunos|como)$/)
    freq[w]++
}} END {for(w in freq) printf "%5d  %s\n", freq[w], w}' \
  "$QJ"/primera-parte/texto/texto-*.adoc | sort -rn | head -15

Tested output

  925  quijote
  652  sancho
  432  respondió     ← "replied" — dialogue drives the novel
  428  vuestra       ← formal address
  396  señor         ← lord/sir
  336  caballero     ← knight
  228  señora        ← lady
  206  manera        ← manner/way
  195  verdad        ← truth

CLI Skills Practiced

Tool Skill

Tool	Skill
`find -printf`	Format specifiers: `%f`, `%h`, `%p`, `%P`, `%s`
`grep -P`	PCRE for Spanish characters, alternation, word boundaries
`grep -rohn`	Recursive, only-matching, file:match, line numbers
`awk`	Frequency arrays, field processing, `END` blocks, multi-file `FILENAME`
`awk BEGIN`	Arithmetic in `printf` format strings
`sed`	Pattern extraction, substitution, line addressing
`comm -23`	Set difference between two sorted lists
`<(cmd)`	Process substitution — treat command output as file argument
`sort -t -k`	Field-specific sorting with custom delimiters
`printf`	Formatted output with alignment, padding
`tr`	Character translation for visual bar charts
`xargs -I{}`	Per-item command execution from pipeline

find -printf

Format specifiers: %f, %h, %p, %P, %s

grep -P

PCRE for Spanish characters, alternation, word boundaries

grep -rohn

Recursive, only-matching, file:match, line numbers

awk

Frequency arrays, field processing, END blocks, multi-file FILENAME

awk BEGIN

Arithmetic in printf format strings

sed

Pattern extraction, substitution, line addressing

comm -23

Set difference between two sorted lists

<(cmd)

Process substitution — treat command output as file argument

sort -t -k

Field-specific sorting with custom delimiters

printf

Formatted output with alignment, padding

tr

Character translation for visual bar charts

xargs -I{}

Per-item command execution from pipeline

Intertextuality Drills

QJ="docs/modules/ROOT/pages/education/literature/quijote"

Drill 1 — Heat Map: Which chapter is the most intertextual?

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-0*//')
  bible=$(grep -ciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$f")
  classical=$(grep -ciP 'aristóteles|platón|homero|virgilio|ovidio|horacio|cicerón|hércules|alejandro|apolo|febo|venus|marte|troya|ulises|eneas' "$f")
  chivalric=$(grep -ciP 'amadís|palmerín|orlando|roldán|lanzarote|tristán|galaor|esplandián|belianís|ariosto' "$f")
  pastoral=$(grep -ciP 'diana|montemayor|arcadia|égloga|pastor[ae]?s?\b|zagal' "$f")
  meta=$(grep -ciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo|traductor' "$f")
  total=$((bible + classical + chivalric + pastoral + meta))
  [ "$total" -gt 0 ] && printf "Ch %-3s  Bib:%-2d  Clas:%-2d  Chiv:%-2d  Past:%-2d  Meta:%-2d  Total:%-3d\n" \
    "$ch" "$bible" "$classical" "$chivalric" "$pastoral" "$meta" "$total"
done | sort -t: -k7 -rn | head -15

Drill 2 — Which tradition dominates each chapter?

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-0*//')
  bible=$(grep -ciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$f")
  classical=$(grep -ciP 'aristóteles|platón|homero|virgilio|hércules|alejandro|apolo|febo' "$f")
  chivalric=$(grep -ciP 'amadís|palmerín|orlando|roldán|lanzarote|belianís|ariosto' "$f")
  pastoral=$(grep -ciP 'diana|montemayor|arcadia|égloga|pastor|zagal' "$f")
  meta=$(grep -ciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo' "$f")
  max=$bible; dom="Biblical"
  [ "$classical" -gt "$max" ] && max=$classical && dom="Classical"
  [ "$chivalric" -gt "$max" ] && max=$chivalric && dom="Chivalric"
  [ "$pastoral" -gt "$max" ] && max=$pastoral && dom="Pastoral"
  [ "$meta" -gt "$max" ] && max=$meta && dom="Metaliterary"
  [ "$max" -gt 0 ] && printf "Ch %-3s → %-12s (%d)\n" "$ch" "$dom" "$max"
done

Drill 3 — Trace a specific author across the text

# Replace AUTHOR with: amadís, ariosto, homero, virgilio, etc.
AUTHOR="amadís"
grep -rn -i "$AUTHOR" "$QJ"/primera-parte/texto/ \
  | awk -F: '{split($1,a,"/"); printf "%-15s L%-4s %s\n", a[length(a)], $2, substr($0,index($0,$3))}'

Drill 4 — Cross-tradition convergence points

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-0*//')
  t=0
  [ $(grep -ciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$f") -gt 0 ] && ((t++))
  [ $(grep -ciP 'aristóteles|platón|homero|virgilio|hércules|alejandro|apolo|febo' "$f") -gt 0 ] && ((t++))
  [ $(grep -ciP 'amadís|palmerín|orlando|roldán|lanzarote|belianís|ariosto' "$f") -gt 0 ] && ((t++))
  [ $(grep -ciP 'diana|montemayor|arcadia|égloga|pastor|zagal' "$f") -gt 0 ] && ((t++))
  [ $(grep -ciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo' "$f") -gt 0 ] && ((t++))
  [ "$t" -ge 3 ] && printf "Ch %-3s  %d/5 traditions\n" "$ch" "$t"
done | sort -t/ -k2 -rn

Drill 5 — Compare Primera vs Segunda Parte intertextuality

for parte in primera-parte segunda-parte; do
  echo "=== ${parte} ==="
  bible=$(grep -rciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
  classical=$(grep -rciP 'aristóteles|platón|homero|virgilio|hércules|alejandro|apolo' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
  chivalric=$(grep -rciP 'amadís|palmerín|orlando|roldán|lanzarote|belianís|ariosto' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
  pastoral=$(grep -rciP 'diana|montemayor|arcadia|égloga|pastor|zagal' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
  meta=$(grep -rciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
  printf "  Biblical: %3d  Classical: %3d  Chivalric: %3d  Pastoral: %3d  Meta: %3d\n" \
    "$bible" "$classical" "$chivalric" "$pastoral" "$meta"
done

CLI Skills Practiced

Skill Application

Skill	Application
`grep -ciP`	Case-insensitive PCRE counting per file
`grep -rn` alternation	Multi-term search across corpus
`awk -F:` field splitting	Parse grep output (file:line:text)
Shell arithmetic `$…`	Sum category scores
Conditional `[ ] && t++`	Count boolean presence per tradition
`sort -t: -k`	Sort by specific field with custom delimiter
`printf` alignment	Formatted tabular output
Process substitution	Compare vocabulary sets between partes

grep -ciP

Case-insensitive PCRE counting per file

grep -rn alternation

Multi-term search across corpus

awk -F: field splitting

Parse grep output (file:line:text)

Shell arithmetic $…

Sum category scores

Conditional [ ] && t++

Count boolean presence per tradition

sort -t: -k

Sort by specific field with custom delimiter

printf alignment

Formatted tabular output

Process substitution

Compare vocabulary sets between partes

Corpus Analysis & Creative Cross-Reference

QJ="docs/modules/ROOT/pages/education/literature/quijote"

Set this variable before running any drill.

Vocabulary Hunting — Find Your Words in Cervantes

Check whether words from your own writing appear in the Quijote corpus.

Single word search with context

grep -rn 'lumbre' "$QJ"/

How many chapters contain the word?

grep -rl 'lumbre' "$QJ"/ | wc -l

Compare Primera vs Segunda Parte usage

grep -rc 'lumbre' "$QJ"/primera-parte/texto/ | awk -F: '$2>0'
grep -rc 'lumbre' "$QJ"/segunda-parte/texto/ | awk -F: '$2>0'

-rc combines recursive search with count per file. awk -F: '$2>0' filters to files with at least one match.

Context Windows — See the Word in Its Sentence

2 lines before and after each match

grep -rn -B2 -A2 'sepulcro' "$QJ"/

Restrict to Primera Parte texto files, 3 lines context

grep -rn -C3 'querella' "$QJ"/primera-parte/texto/

-B = before, -A = after, -C = both. These flags turn grep from a line finder into a passage finder.

Rhetorical Pattern Matching

Tricolons — three comma-separated items

grep -rn -P '\w+, \w+ y \w+' "$QJ"/primera-parte/texto/texto-033.adoc

Exclamations — Cervantes' pathos markers

grep -rn '¡' "$QJ"/primera-parte/texto/texto-034.adoc

Subjunctive markers — the mood of doubt and desire

grep -rn -P 'fuese|fuera|hubiese|hubiera|tuviese|tuviera' "$QJ"/primera-parte/texto/texto-033.adoc

Comparative Chapter Analysis

Dialogue density — which chapters are conversation-heavy?

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  count=$(grep -c '—' "$f")
  printf "%-20s %s\n" "$(basename $f)" "$count"
done | sort -k2 -rn | head -10

Tested output

texto-023.adoc       72
texto-031.adoc       68
texto-029.adoc       62

Chapters 33–34 (Novela del curioso impertinente) show near-zero dialogue — pure narrative prose. The dialogue-heavy chapters are the adventure and conversation episodes.

Honor frequency — the Curioso’s obsession

grep -ric 'honor\|honra' "$QJ"/primera-parte/texto/ | sort -t: -k2 -rn | head -10

Character frequency across Primera Parte

for name in Sancho Dulcinea Rocinante Lotario Camila Anselmo; do
  count=$(grep -ric "$name" "$QJ"/primera-parte/texto/ | awk -F: '{s+=$2} END {print s}')
  printf "%-12s %s mentions\n" "$name" "$count"
done

Tested output

Sancho       652 mentions
Dulcinea     108 mentions
Rocinante    118 mentions
Lotario       92 mentions   ← confined to chapters 33-35
Camila        85 mentions   ← confined to chapters 33-35
Anselmo       74 mentions   ← confined to chapters 33-35

Lotario, Camila, and Anselmo exist only inside the interpolated novel. They never touch the main narrative.

awk — Text Analysis Beyond grep

Unique words in a chapter (vocabulary breadth)

awk '{for(i=1;i<=NF;i++) w[tolower($i)]++} END {print length(w) " unique words"}' \
  "$QJ"/primera-parte/texto/texto-033.adoc

Longest lines — Cervantes' marathon sentences

awk 'length > 200 {printf "%s:%d (%d chars)\n", FILENAME, NR, length}' \
  "$QJ"/primera-parte/texto/texto-033.adoc

Average words per line (sentence density)

awk '{total+=NF; lines++} END {printf "%.1f words/line\n", total/lines}' \
  "$QJ"/primera-parte/texto/texto-034.adoc

Top 20 most frequent words in the Curioso (chapters 33–35)

awk '{for(i=1;i<=NF;i++) w[tolower($i)]++} END {for(k in w) print w[k], k}' \
  "$QJ"/primera-parte/texto/texto-03{3,4,5}.adoc | sort -rn | head -20

Brace expansion {3,4,5} selects exactly those three files — no loop needed.

Sonnet Cross-Reference

Check every significant word from a sonnet against the Cervantes corpus.

Batch word search with frequency count

for word in desdenes sepulcro lumbre querella suspiros sombras ingrata; do
  count=$(grep -ric "$word" "$QJ"/ | awk -F: '{s+=$2} END {print s}')
  printf "%-12s %s occurrences in Quijote\n" "$word" "$count"
done

The count tells you whether you’re in Cervantes' register or outside it. High count = common Cervantine vocabulary. Zero = your own invention or a word from a different tradition.

Archaic Forms — Golden Age Register

Track archaic -lle endings (honralle, festejalle, persuadille)

grep -rn -P '\w+lle\b' "$QJ"/primera-parte/texto/

Archaic vocabulary frequency across the whole corpus

grep -rohn -P '\b(fermosa|vuestra merced|facienda|agora|mesmo|ansí)\b' \
  "$QJ"/primera-parte/texto/ | awk -F: '{freq[$2]++} END {for(w in freq) printf "%4d  %s\n", freq[w], w}' | sort -rn

-o outputs only the matched text, not the whole line. Combined with awk frequency counting, this gives you a vocabulary census.

Hapax Legomena — Words Used Exactly Once

Cervantes' rarest vocabulary choices. Potential material for your own writing.

awk '{for(i=1;i<=NF;i++) {
  w=tolower($i); gsub(/[.,;:!?¡¿«»""()—]/, "", w)
  if(w!="") freq[w]++
}} END {for(w in freq) if(freq[w]==1) print w}' \
  "$QJ"/primera-parte/texto/texto-*.adoc | sort | head -40

These are words Cervantes reached for once and never used again. Each one was a deliberate choice for a specific moment.

CLI Skills Practiced

Tool Skill

Tool	Skill
`grep -rn -B -A -C`	Context windows around matches
`grep -ric`	Recursive case-insensitive count per file
`grep -P`	PCRE regex for Spanish characters and alternation
`awk -F:`	Field splitting on colon (grep output format)
`awk` frequency arrays	`w[tolower($i)]++` pattern for word counting
`for` loop + `printf`	Formatted iteration over files and word lists
brace expansion	`{3,4,5}` and `{001..020}` for file selection
`sort -t: -k2 -rn`	Field-specific numeric reverse sort
`wc -l`	Count pipeline results

grep -rn -B -A -C

Context windows around matches

grep -ric

Recursive case-insensitive count per file

grep -P

PCRE regex for Spanish characters and alternation

awk -F:

Field splitting on colon (grep output format)

awk frequency arrays

w[tolower($i)]++ pattern for word counting

for loop + printf

Formatted iteration over files and word lists

brace expansion

{3,4,5} and {001..020} for file selection

sort -t: -k2 -rn

Field-specific numeric reverse sort

wc -l

Count pipeline results