Don Quijote — CLI Literary Analysis

CLI commands for analyzing Cervantes' Don Quijote de la Mancha directly from the terminal. The full corpus (126 chapters across Primera and Segunda Parte) serves as both literary study material and CLI training data.

Every drill hits two targets: CLI mastery and Spanish comprehension.

QJ="docs/modules/ROOT/pages/education/literature/quijote"

Drill Sets

Category Focus

Vocabulary Drills

Vocabulary density per chapter, hapax legomena, archaic forms, character dominance, dialogue density, content word frequency

Intertextuality Drills

Biblical, classical, chivalric, pastoral, and metafictional references — heat maps and tradition tracking

Corpus Analysis & Creative Cross-Reference

Vocabulary hunting, context windows, rhetorical pattern matching, comparative analysis, sonnet cross-referencing

CLI × Don Quijote — Literary Analysis from the Terminal

The entire Primera Parte (52 chapters) and Segunda Parte are under:

QJ="docs/modules/ROOT/pages/education/literature/quijote"

Set this variable before running any drill.

Vocabulary Density — Which Chapters Are Richest?

Unique words / total words = vocabulary density. Higher = Cervantes used more diverse language.

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-//')
  total=$(awk '{for(i=1;i<=NF;i++) n++} END{print n}' "$f")
  unique=$(awk '{for(i=1;i<=NF;i++) w[tolower($i)]++} END{print length(w)}' "$f")
  ratio=$(awk "BEGIN{printf \"%.2f\", $unique/$total * 100}")
  printf "Cap %s: %4d unique / %5d total = %s%% density\n" "$ch" "$unique" "$total" "$ratio"
done | sort -t= -k2 -rn | head -10
Tested output
Cap 011:  976 unique /  2175 total = 44.87% density  ← Golden Age speech
Cap 050: 1084 unique /  2482 total = 43.67% density
Cap 038:  716 unique /  1640 total = 43.66% density  ← Arms and Letters
Cap 001:  823 unique /  1917 total = 42.93% density  ← Opening chapter

Chapter 11 (the Golden Age speech) has the highest vocabulary density in the entire Primera Parte. Cervantes pulled out his full lexicon for that passage.

Hapax Legomena — Words Used Exactly Once

A hapax legomenon is a word that appears only once in the entire corpus. These are Cervantes' rarest choices.

awk '{for(i=1;i<=NF;i++) {
  w=tolower($i); gsub(/[.,;:!?¡¿«»""()—]/, "", w)
  if(w!="") {freq[w]++; file[w]=FILENAME}
}} END {for(w in freq) if(freq[w]==1) print w, file[w]}' \
  "$QJ"/primera-parte/texto/texto-*.adoc \
  | awk '{split($2,a,"/"); split(a[length(a)],b,"."); print b[1], $1}' \
  | sort | head -20

The Crucible — Metallurgical Vocabulary

Cervantes uses goldsmith language as metaphor: quilatar (to assay), crisolar/acrisolar (to purify in a crucible), quilates (carats). Trace this imagery across the entire work.

grep -rn -P 'quilat|crisol|acrisol' "$QJ"/ | \
  awk -F: '{
    split($1,a,"/")
    parte = (a[length(a)-2] == "primera-parte") ? "I" : "II"
    ch = a[length(a)]
    gsub(/texto-0?/, "Cap ", ch)
    gsub(/\.adoc/, "", ch)
    text = substr($0, index($0,$3))
    gsub(/^[ \t]+/, "", text)
    printf "Parte %-2s %-8s L%-4s %s\n", parte, ch, $2, text
  }'
Tested output
Parte I  Cap 33   L121  verdad, si no es probándola de manera que la prueba manifieste los quilates
Parte I  Cap 33   L134  deseo que Camila, mi esposa, pase por estas dificultades y se acrisole y
Parte I  Cap 33   L135  quilate en el fuego de verse requerida y solicitada
Parte I  Cap 43   L87   que la que se quilata por su gusto;
Parte II Cap 10   L294  yo vi su fealdad, sino su hermosura, a la cual subía de punto y quilates

All concentrated in the Novela del curioso impertinente (Cap 33) — Anselmo testing Camila’s virtue like gold in fire.

Archaic vs Modern — Cervantes' Time Capsule

Words that were already archaic in Cervantes' time, used deliberately for effect.

grep -rohn -P '\b(fermosa|vuestra merced|facienda|agora|mesmo|ansí|aqueste|aquesa|desaguisado|follón|malandrín|fecho|maguer|vegada|yantar)\b' \
  "$QJ"/primera-parte/texto/ \
  | awk -F: '{freq[$2]++}
  END {for(w in freq) printf "%4d  %s\n", freq[w], w}' \
  | sort -rn
Tested output
 294  vuestra merced
 147  mesmo
  89  vos
  49  ansí
  47  agora
  23  fecho
  19  fermosa    ← deliberately archaic for "hermosa"
   8  desaguisado
   4  follón
   2  malandrín

Character Dominance — Who Owns Each Chapter?

for f in "$QJ"/primera-parte/texto/texto-{001..020}.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-/Cap /')
  dq=$(grep -oi 'don quijote\|quijote\|caballero' "$f" | wc -l)
  sp=$(grep -oi 'sancho\|panza\|escudero' "$f" | wc -l)
  dl=$(grep -oi 'dulcinea\|toboso\|señora' "$f" | wc -l)
  printf "%-7s  Quijote: %-3d  Sancho: %-3d  Dulcinea: %-3d  " "$ch" "$dq" "$sp" "$dl"
  if [ "$dq" -gt "$sp" ] && [ "$dq" -gt "$dl" ]; then echo "→ Don Quijote"
  elif [ "$sp" -gt "$dq" ]; then echo "→ Sancho"
  elif [ "$dl" -gt "$dq" ]; then echo "→ Dulcinea"
  else echo "→ balanced"; fi
done
Tested output — Sancho doesn’t appear until Chapter 7, doesn’t dominate until Chapter 15
Cap 001  Quijote: 20   Sancho: 0    Dulcinea: 7    → Don Quijote
Cap 007  Quijote: 31   Sancho: 22   Dulcinea: 0    → Don Quijote
Cap 015  Quijote: 43   Sancho: 44   Dulcinea: 3    → Sancho  ← first Sancho chapter

Dialogue Density — Visual Bar Chart

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-0*//')
  total=$(grep -c '.' "$f")
  dialogue=$(grep -c '^—' "$f")
  pct=$(awk "BEGIN{printf \"%.0f\", ($dialogue/$total)*100}")
  bar=$(printf '%*s' "$((pct/2))" '' | tr ' ' '█')
  printf "Ch %-3s %3d%% %s\n" "$ch" "$pct" "$bar"
done

Chapters 33-34 (Novela del curioso impertinente) show 0% dialogue — they’re pure narrative prose. Chapter 6 (the book burning) is the most dialogue-heavy.

Cervantes' Breath — Longest Paragraphs

Reconstruct paragraphs from line-wrapped text, then measure.

awk '
  /^[=:\/\[]/ {next}
  /^$/ {
    if (length(para) > 0) {
      split(FILENAME, a, "/")
      printf "%4d chars | %-15s L%-4d | %.85s...\n", length(para), a[length(a)], start, para
      para = ""
    }
    next
  }
  { if (length(para) == 0) start = NR; para = para " " $0 }
' "$QJ"/primera-parte/texto/texto-*.adoc | sort -rn | head -5
Tested output
6814 chars | texto-038.adoc   ← Arms and Letters speech — Cervantes' longest single breath
6652 chars | texto-048.adoc   ← The Canon's literary criticism
6616 chars | texto-050.adoc   ← Don Quijote's defense of chivalric romances

Words Unique to the Golden Age Speech (Process Substitution)

comm -23 finds words in file A that are NOT in file B. Process substitution lets us compare without temp files.

comm -23 \
  <(awk '{for(i=1;i<=NF;i++){w=tolower($i); gsub(/[.,;:!?¡¿«»""()—\[\]]/,"",w); if(length(w)>3) print w}}' \
    "$QJ"/primera-parte/texto/texto-011.adoc | sort -u) \
  <(awk '{for(i=1;i<=NF;i++){w=tolower($i); gsub(/[.,;:!?¡¿«»""()—\[\]]/,"",w); if(length(w)>3) print w}}' \
    "$QJ"/primera-parte/texto/texto-0{01..10}.adoc "$QJ"/primera-parte/texto/texto-0{12..52}.adoc 2>/dev/null | sort -u) \
  | head -20
Tested output — words Cervantes used ONLY in the Golden Age chapter
abejas          ← bees (the "solícitas y discretas abejas")
alcornoques     ← cork oaks
barraganía      ← concubinage (archaic legal term)
bellotas        ← acorns
zagalejas       ← shepherdesses (diminutive)

These are pastoral vocabulary — Cervantes shifted register entirely for this speech.

Cervantes' Favorite Content Words

Filter out articles, prepositions, and common verbs to find what Cervantes actually talks about.

awk '{for(i=1;i<=NF;i++){
  w=tolower($i)
  gsub(/[.,;:!?¡¿«»""()—\[\]]/,"",w)
  if(length(w)>4 && w!~/^(sobre|donde|entre|porque|también|aunque|antes|después|desde|cuando|todos|todas|tiene|tenía|hacer|había|hasta|siendo|tanto|mucho|puede|decir|dijo|aquella|aquel|estas|estos|mismo|otros|otras|podía|quien|algunos|como)$/)
    freq[w]++
}} END {for(w in freq) printf "%5d  %s\n", freq[w], w}' \
  "$QJ"/primera-parte/texto/texto-*.adoc | sort -rn | head -15
Tested output
  925  quijote
  652  sancho
  432  respondió     ← "replied" — dialogue drives the novel
  428  vuestra       ← formal address
  396  señor         ← lord/sir
  336  caballero     ← knight
  228  señora        ← lady
  206  manera        ← manner/way
  195  verdad        ← truth

CLI Skills Practiced

Tool Skill

find -printf

Format specifiers: %f, %h, %p, %P, %s

grep -P

PCRE for Spanish characters, alternation, word boundaries

grep -rohn

Recursive, only-matching, file:match, line numbers

awk

Frequency arrays, field processing, END blocks, multi-file FILENAME

awk BEGIN

Arithmetic in printf format strings

sed

Pattern extraction, substitution, line addressing

comm -23

Set difference between two sorted lists

<(cmd)

Process substitution — treat command output as file argument

sort -t -k

Field-specific sorting with custom delimiters

printf

Formatted output with alignment, padding

tr

Character translation for visual bar charts

xargs -I{}

Per-item command execution from pipeline

Intertextuality Drills

QJ="docs/modules/ROOT/pages/education/literature/quijote"

Drill 1 — Heat Map: Which chapter is the most intertextual?

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-0*//')
  bible=$(grep -ciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$f")
  classical=$(grep -ciP 'aristóteles|platón|homero|virgilio|ovidio|horacio|cicerón|hércules|alejandro|apolo|febo|venus|marte|troya|ulises|eneas' "$f")
  chivalric=$(grep -ciP 'amadís|palmerín|orlando|roldán|lanzarote|tristán|galaor|esplandián|belianís|ariosto' "$f")
  pastoral=$(grep -ciP 'diana|montemayor|arcadia|égloga|pastor[ae]?s?\b|zagal' "$f")
  meta=$(grep -ciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo|traductor' "$f")
  total=$((bible + classical + chivalric + pastoral + meta))
  [ "$total" -gt 0 ] && printf "Ch %-3s  Bib:%-2d  Clas:%-2d  Chiv:%-2d  Past:%-2d  Meta:%-2d  Total:%-3d\n" \
    "$ch" "$bible" "$classical" "$chivalric" "$pastoral" "$meta" "$total"
done | sort -t: -k7 -rn | head -15

Drill 2 — Which tradition dominates each chapter?

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-0*//')
  bible=$(grep -ciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$f")
  classical=$(grep -ciP 'aristóteles|platón|homero|virgilio|hércules|alejandro|apolo|febo' "$f")
  chivalric=$(grep -ciP 'amadís|palmerín|orlando|roldán|lanzarote|belianís|ariosto' "$f")
  pastoral=$(grep -ciP 'diana|montemayor|arcadia|égloga|pastor|zagal' "$f")
  meta=$(grep -ciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo' "$f")
  max=$bible; dom="Biblical"
  [ "$classical" -gt "$max" ] && max=$classical && dom="Classical"
  [ "$chivalric" -gt "$max" ] && max=$chivalric && dom="Chivalric"
  [ "$pastoral" -gt "$max" ] && max=$pastoral && dom="Pastoral"
  [ "$meta" -gt "$max" ] && max=$meta && dom="Metaliterary"
  [ "$max" -gt 0 ] && printf "Ch %-3s → %-12s (%d)\n" "$ch" "$dom" "$max"
done

Drill 3 — Trace a specific author across the text

# Replace AUTHOR with: amadís, ariosto, homero, virgilio, etc.
AUTHOR="amadís"
grep -rn -i "$AUTHOR" "$QJ"/primera-parte/texto/ \
  | awk -F: '{split($1,a,"/"); printf "%-15s L%-4s %s\n", a[length(a)], $2, substr($0,index($0,$3))}'

Drill 4 — Cross-tradition convergence points

for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  ch=$(basename "$f" .adoc | sed 's/texto-0*//')
  t=0
  [ $(grep -ciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$f") -gt 0 ] && ((t++))
  [ $(grep -ciP 'aristóteles|platón|homero|virgilio|hércules|alejandro|apolo|febo' "$f") -gt 0 ] && ((t++))
  [ $(grep -ciP 'amadís|palmerín|orlando|roldán|lanzarote|belianís|ariosto' "$f") -gt 0 ] && ((t++))
  [ $(grep -ciP 'diana|montemayor|arcadia|égloga|pastor|zagal' "$f") -gt 0 ] && ((t++))
  [ $(grep -ciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo' "$f") -gt 0 ] && ((t++))
  [ "$t" -ge 3 ] && printf "Ch %-3s  %d/5 traditions\n" "$ch" "$t"
done | sort -t/ -k2 -rn

Drill 5 — Compare Primera vs Segunda Parte intertextuality

for parte in primera-parte segunda-parte; do
  echo "=== ${parte} ==="
  bible=$(grep -rciP 'salomón|david|sansón|pecado|paraíso|lázaro|moisés' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
  classical=$(grep -rciP 'aristóteles|platón|homero|virgilio|hércules|alejandro|apolo' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
  chivalric=$(grep -rciP 'amadís|palmerín|orlando|roldán|lanzarote|belianís|ariosto' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
  pastoral=$(grep -rciP 'diana|montemayor|arcadia|égloga|pastor|zagal' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
  meta=$(grep -rciP 'cide hamete|benengeli|autor.*historia|pluma|arábigo' "$QJ"/"$parte"/texto/ | awk -F: '{s+=$2} END{print s}')
  printf "  Biblical: %3d  Classical: %3d  Chivalric: %3d  Pastoral: %3d  Meta: %3d\n" \
    "$bible" "$classical" "$chivalric" "$pastoral" "$meta"
done

CLI Skills Practiced

Skill Application

grep -ciP

Case-insensitive PCRE counting per file

grep -rn alternation

Multi-term search across corpus

awk -F: field splitting

Parse grep output (file:line:text)

Shell arithmetic $…​

Sum category scores

Conditional [ ] && t++

Count boolean presence per tradition

sort -t: -k

Sort by specific field with custom delimiter

printf alignment

Formatted tabular output

Process substitution

Compare vocabulary sets between partes

Corpus Analysis & Creative Cross-Reference

QJ="docs/modules/ROOT/pages/education/literature/quijote"

Set this variable before running any drill.

Vocabulary Hunting — Find Your Words in Cervantes

Check whether words from your own writing appear in the Quijote corpus.

Single word search with context
grep -rn 'lumbre' "$QJ"/
How many chapters contain the word?
grep -rl 'lumbre' "$QJ"/ | wc -l
Compare Primera vs Segunda Parte usage
grep -rc 'lumbre' "$QJ"/primera-parte/texto/ | awk -F: '$2>0'
grep -rc 'lumbre' "$QJ"/segunda-parte/texto/ | awk -F: '$2>0'

-rc combines recursive search with count per file. awk -F: '$2>0' filters to files with at least one match.

Context Windows — See the Word in Its Sentence

2 lines before and after each match
grep -rn -B2 -A2 'sepulcro' "$QJ"/
Restrict to Primera Parte texto files, 3 lines context
grep -rn -C3 'querella' "$QJ"/primera-parte/texto/

-B = before, -A = after, -C = both. These flags turn grep from a line finder into a passage finder.

Rhetorical Pattern Matching

Tricolons — three comma-separated items
grep -rn -P '\w+, \w+ y \w+' "$QJ"/primera-parte/texto/texto-033.adoc
Exclamations — Cervantes' pathos markers
grep -rn '¡' "$QJ"/primera-parte/texto/texto-034.adoc
Subjunctive markers — the mood of doubt and desire
grep -rn -P 'fuese|fuera|hubiese|hubiera|tuviese|tuviera' "$QJ"/primera-parte/texto/texto-033.adoc

Comparative Chapter Analysis

Dialogue density — which chapters are conversation-heavy?
for f in "$QJ"/primera-parte/texto/texto-*.adoc; do
  count=$(grep -c '—' "$f")
  printf "%-20s %s\n" "$(basename $f)" "$count"
done | sort -k2 -rn | head -10
Tested output
texto-023.adoc       72
texto-031.adoc       68
texto-029.adoc       62

Chapters 33–34 (Novela del curioso impertinente) show near-zero dialogue — pure narrative prose. The dialogue-heavy chapters are the adventure and conversation episodes.

Honor frequency — the Curioso’s obsession
grep -ric 'honor\|honra' "$QJ"/primera-parte/texto/ | sort -t: -k2 -rn | head -10
Character frequency across Primera Parte
for name in Sancho Dulcinea Rocinante Lotario Camila Anselmo; do
  count=$(grep -ric "$name" "$QJ"/primera-parte/texto/ | awk -F: '{s+=$2} END {print s}')
  printf "%-12s %s mentions\n" "$name" "$count"
done
Tested output
Sancho       652 mentions
Dulcinea     108 mentions
Rocinante    118 mentions
Lotario       92 mentions   ← confined to chapters 33-35
Camila        85 mentions   ← confined to chapters 33-35
Anselmo       74 mentions   ← confined to chapters 33-35

Lotario, Camila, and Anselmo exist only inside the interpolated novel. They never touch the main narrative.

awk — Text Analysis Beyond grep

Unique words in a chapter (vocabulary breadth)
awk '{for(i=1;i<=NF;i++) w[tolower($i)]++} END {print length(w) " unique words"}' \
  "$QJ"/primera-parte/texto/texto-033.adoc
Longest lines — Cervantes' marathon sentences
awk 'length > 200 {printf "%s:%d (%d chars)\n", FILENAME, NR, length}' \
  "$QJ"/primera-parte/texto/texto-033.adoc
Average words per line (sentence density)
awk '{total+=NF; lines++} END {printf "%.1f words/line\n", total/lines}' \
  "$QJ"/primera-parte/texto/texto-034.adoc
Top 20 most frequent words in the Curioso (chapters 33–35)
awk '{for(i=1;i<=NF;i++) w[tolower($i)]++} END {for(k in w) print w[k], k}' \
  "$QJ"/primera-parte/texto/texto-03{3,4,5}.adoc | sort -rn | head -20

Brace expansion {3,4,5} selects exactly those three files — no loop needed.

Sonnet Cross-Reference

Check every significant word from a sonnet against the Cervantes corpus.

Batch word search with frequency count
for word in desdenes sepulcro lumbre querella suspiros sombras ingrata; do
  count=$(grep -ric "$word" "$QJ"/ | awk -F: '{s+=$2} END {print s}')
  printf "%-12s %s occurrences in Quijote\n" "$word" "$count"
done

The count tells you whether you’re in Cervantes' register or outside it. High count = common Cervantine vocabulary. Zero = your own invention or a word from a different tradition.

Archaic Forms — Golden Age Register

Track archaic -lle endings (honralle, festejalle, persuadille)
grep -rn -P '\w+lle\b' "$QJ"/primera-parte/texto/
Archaic vocabulary frequency across the whole corpus
grep -rohn -P '\b(fermosa|vuestra merced|facienda|agora|mesmo|ansí)\b' \
  "$QJ"/primera-parte/texto/ | awk -F: '{freq[$2]++} END {for(w in freq) printf "%4d  %s\n", freq[w], w}' | sort -rn

-o outputs only the matched text, not the whole line. Combined with awk frequency counting, this gives you a vocabulary census.

Hapax Legomena — Words Used Exactly Once

Cervantes' rarest vocabulary choices. Potential material for your own writing.

awk '{for(i=1;i<=NF;i++) {
  w=tolower($i); gsub(/[.,;:!?¡¿«»""()—]/, "", w)
  if(w!="") freq[w]++
}} END {for(w in freq) if(freq[w]==1) print w}' \
  "$QJ"/primera-parte/texto/texto-*.adoc | sort | head -40

These are words Cervantes reached for once and never used again. Each one was a deliberate choice for a specific moment.

CLI Skills Practiced

Tool Skill

grep -rn -B -A -C

Context windows around matches

grep -ric

Recursive case-insensitive count per file

grep -P

PCRE regex for Spanish characters and alternation

awk -F:

Field splitting on colon (grep output format)

awk frequency arrays

w[tolower($i)]++ pattern for word counting

for loop + printf

Formatted iteration over files and word lists

brace expansion

{3,4,5} and {001..020} for file selection

sort -t: -k2 -rn

Field-specific numeric reverse sort

wc -l

Count pipeline results