Risk Assessment & Success Criteria

Risk Assessment

Risk Likelihood Mitigation

Catastrophic forgetting

Medium

Keep base model, A/B test before switching

Training data too small

High

Start with templates + math, expand iteratively

Overfitting to training examples

Medium

Use validation set, check generalization

VRAM insufficient for 14B QLoRA

Low

Fall back to 7B, or reduce batch size to 1

Time sink with marginal gains

Medium

Set hard cutoff: if A- doesn’t become A+ in 2 rounds, stop

Success Criteria

  • Fine-tuned model scores A or A+ on evaluation suite (vs A- with prompting)

  • No regression on general coding tasks

  • Model works in Aider with same config (just different model name)

  • Total time investment under 15 hours

  • Process documented as a repeatable runbook