Roadmap
1. Phase 1: Environment Setup (2-3 hours)
-
Install CUDA toolkit and verify GPU access
-
Install Unsloth and dependencies
-
Verify GPU memory available for training
-
Run Unsloth’s hello-world example to confirm pipeline works
# Verify CUDA
nvidia-smi
python -c "import torch; print(torch.cuda.is_available())"
# Install Unsloth
pip install unsloth
2. Phase 2: Data Curation (4-6 hours)
This is the hardest and most important phase. Quality of training data determines quality of the fine-tuned model.
3. Data Sources
| Source | Content | Est. Examples |
|---|---|---|
Document templates (16 files) |
Input: "Create a [type] document" → Output: template content |
16 |
Math percentage curriculum |
Input: "Create problem about X" → Output: proper STEM syntax |
30-50 |
Existing worklogs |
Input: "Create worklog for date" → Output: worklog structure |
20-30 |
Case studies |
Input: "Document incident/change" → Output: structured case study |
10-15 |
Convention corrections |
Input: bad AsciiDoc → Output: fixed AsciiDoc |
20-30 |
4. Data Format (JSONL)
{"instruction": "Create a math concept page about logarithms using AsciiDoc STEM syntax", "output": "= Logarithms\n:stem: latexmath\n:icons: font\n\n== Foundations\n\n=== What Is a Logarithm?\n\nA logarithm is the inverse of exponentiation. If stem:[b^y = x], then stem:[\\log_b(x) = y].\n\n[stem]\n++++\n\\log_b(x) = y \\iff b^y = x\n++++\n..."}
{"instruction": "Fix the STEM syntax in this AsciiDoc:\n\n[latexmath]\n$$E = mc^2$$\n\nThe formula \\( F = ma \\) describes force.", "output": "[stem]\n++++\nE = mc^2\n++++\n\nThe formula stem:[F = ma] describes force."}
5. Data Curation Script
-
Write Python script to extract instruction/output pairs from existing .adoc files
-
Manual review and cleanup of generated pairs
-
Split into train (80%) / validation (20%) sets
-
Target: 100-200 high-quality examples minimum
7. QLoRA Configuration
| Parameter | Value |
|---|---|
Base model |
|
Quantization |
4-bit (QLoRA) |
LoRA rank (r) |
16 (start here, increase to 32 if needed) |
LoRA alpha |
32 |
Target modules |
|
Learning rate |
2e-4 |
Epochs |
3-5 |
Batch size |
2 (with gradient accumulation 4) |
Max sequence length |
4096 |
8. Training Script Skeleton
from unsloth import FastLanguageModel
import torch
# Load base model with 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Qwen2.5-Coder-14B-Instruct",
max_seq_length=4096,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_dropout=0.05,
)
# Load training data
# ... (from curated JSONL)
# Train
# ... (Hugging Face Trainer with SFTTrainer)
# Save adapter
model.save_pretrained("domus-coder-adapter")
# Export to GGUF for Ollama
model.save_pretrained_gguf("domus-coder", tokenizer, quantization_method="q4_k_m")
10. Test Suite
| Test | Prompt | Expected |
|---|---|---|
STEM inline |
"Write a sentence using the quadratic formula" |
Uses |
STEM display block |
"Show the compound interest formula" |
Uses |
Collapsible |
"Create a practice problem" |
Uses |
Template adherence |
"Create a worklog for today" |
Includes all partial includes, correct header |
No hardcoding |
"Document an ISE policy change" |
Uses |
Convention awareness |
"Create a meeting notes document" |
Uses MTG prefix, has decisions table, action items |
11. Evaluation Process
-
Run each test prompt through base model AND fine-tuned model
-
Score both outputs on the same rubric
-
Calculate improvement percentage
-
If improvement < 10%, revisit training data quality
-
If catastrophic forgetting (general code worse), reduce epochs or training examples
12. Phase 5: Deployment (1 hour)
-
Export GGUF model
-
Import into Ollama:
ollama create domus-coder -f Modelfile -
Update
.aider.conf.ymlto use fine-tuned model -
Update
.aider.model.settings.ymlwith new model entry -
Test in Aider with real tasks
-
Document results in worklog