Local Model Fine-Tuning with Unsloth

1. Executive Summary

Target: Fine-tune qwen2.5-coder:14b on personal AsciiDoc conventions, STEM syntax, and documentation patterns
Hardware: RTX 5090 Mobile (24GB VRAM) + 64GB RAM
Method: QLoRA (4-bit quantized LoRA) via Unsloth
Investment: ~10-15 hours (data curation + training + evaluation)

Foundation Assets:

Working Aider + Ollama pipeline (tested, A- quality with prompting)
15+ domus-* repos with thousands of AsciiDoc files as training source
16 document templates with established patterns
106-problem math curriculum demonstrating target STEM syntax

2. Strategic Alignment

Fine-tuning is the next step after prompting hits a ceiling. Current state:

Approach	Quality	Limitation
Raw model (no config)	C+ — wrong syntax, ignored templates	No awareness of conventions
Prompting + CONVENTIONS.md	A- — correct syntax, follows templates	Still needs explicit instructions per session
Fine-tuned model	Target: A — natively knows patterns	Requires data curation upfront

Approach

Quality

Limitation

Raw model (no config)

C+ — wrong syntax, ignored templates

No awareness of conventions

Prompting + CONVENTIONS.md

A- — correct syntax, follows templates

Still needs explicit instructions per session

Fine-tuned model

Target: A — natively knows patterns

Requires data curation upfront

Fine-tuning makes sense when:

The model makes the same mistake repeatedly despite prompting
You have a repetitive workflow (daily worklogs, math pages, case studies)
You want to learn ML fundamentals (career investment)

3. Resources & Related

4. Improvement Proposals

Proposals from ecosystem audit — 2026-04-04. For team review and prioritization.

Priority	Proposal	Rationale	Effort
P2	Model comparison table (benchmarks, sizes, use cases)	Document tested models with: parameter count, VRAM requirement, inference speed, quality score per task. Prevents re-evaluating the same models.	M
P2	Hardware requirements reference	Map model sizes to GPU requirements: 7B models on 8GB VRAM, 13B on 16GB, 70B on 2x24GB, etc. Include CPU fallback performance.	S
P3	Training data preparation guide	Document the data pipeline: collection, cleaning, formatting (Alpaca, ShareGPT, ChatML), tokenization, and validation steps.	M
P3	Evaluation metrics documentation	Define how to measure fine-tuned model quality: perplexity, BLEU, task-specific benchmarks, human evaluation rubrics.	M

Priority

Proposal

Rationale

Effort

P2

Model comparison table (benchmarks, sizes, use cases)

Document tested models with: parameter count, VRAM requirement, inference speed, quality score per task. Prevents re-evaluating the same models.

M

P2

Hardware requirements reference

Map model sizes to GPU requirements: 7B models on 8GB VRAM, 13B on 16GB, 70B on 2x24GB, etc. Include CPU fallback performance.

S

P3

Training data preparation guide

Document the data pipeline: collection, cleaning, formatting (Alpaca, ShareGPT, ChatML), tokenization, and validation steps.

M

P3

Evaluation metrics documentation

Define how to measure fine-tuned model quality: perplexity, BLEU, task-specific benchmarks, human evaluation rubrics.

M

4.1. Resources

Resource	Type	Notes
Unsloth	Tool	2-5x faster QLoRA, designed for consumer GPUs
Qwen2.5-Coder-14B on HF	Model	Base model for fine-tuning
Unsloth GitHub	Docs	Examples, notebooks, guides
SFTTrainer Docs	Docs	Supervised fine-tuning trainer API

Resource

Type

Notes

Unsloth

Tool

2-5x faster QLoRA, designed for consumer GPUs

Qwen2.5-Coder-14B on HF

Model

Base model for fine-tuning

Unsloth GitHub

Docs

Examples, notebooks, guides

SFTTrainer Docs