Aider + Ollama: Offline Coding Assistant
Quick Start
# Primary — newest MoE model (3.3B active params, fast + smart)
aider --model ollama_chat/qwen3-coder:30b
# Alternative — battle-tested, 71.4% Aider benchmark
aider --model ollama_chat/qwen2.5-coder:32b
# Fallback — fast, smaller VRAM footprint
aider --model ollama_chat/qwen2.5-coder:14b
|
Use |
Pre-Flight (While Online)
Pull models before going offline:
ollama pull qwen2.5-coder:32b
ollama pull qwen2.5-coder:14b
# Verify they're cached
ollama list
Set the environment variable (add to ~/.zshrc):
export OLLAMA_API_BASE=http://127.0.0.1:11434
Configuration Files
| File | Purpose |
|---|---|
|
Global settings — architect mode, auto-commit OFF, whole edit format |
|
CRITICAL — sets |
|
Global coding standards (AsciiDoc rules, git conventions) |
|
Project-specific rules (partials system, attributes, prefixes) |
|
Excludes large/irrelevant directories from context |
Why These Settings Matter
num_ctx (Context Window)
|
Ollama defaults The |
Architect Mode
Architect mode is the single biggest quality improvement for local models. It separates two concerns:
-
Architect pass — the model reasons about what to do (natural language)
-
Editor pass — the model formats the file edits (structured output)
Quantized local models fail when asked to reason AND follow strict formatting in one pass. Architect mode fixes this.
Edit Format: whole not diff
| Format | How It Works | Local Model Result |
|---|---|---|
|
Search/replace blocks |
FAILS — quantized models can’t follow the syntax |
|
Returns entire file |
Works — no formatting to get wrong |
|
Simplified whole for architect mode |
Best — used automatically in architect mode |
Critical Safety Settings
| Setting | Value | Why |
|---|---|---|
|
|
CRITICAL. Local models WILL produce bad code. Review everything before committing. |
|
|
32B models hallucinate shell commands. Don’t trust them. |
|
|
Only commit staged changes, never working tree noise. |
|
|
No AI attribution per policy. |
|
|
Separates thinking from editing for better quality. |
|
|
Quantized models cannot reliably produce diff format. |
|
|
Low temperature = precise formatting. Set in model settings. |
Workflow: How to Use Effectively
Rule 1: Small, focused tasks
Local models lose coherence on multi-step tasks. Ask for ONE thing at a time:
# GOOD — single focused request
/add docs/modules/ROOT/pages/templates/TEMPLATE-math-concept.adoc
> Create a new math concept page about logarithms following this template exactly
# BAD — multi-step, model will drift
> Reorganize the worklogs, update the nav, create a new partial, and fix the xrefs
Rule 2: Add only the files the model needs
# Add specific files to context
/add docs/modules/ROOT/pages/drafts/my-file.adoc
/add docs/antora.yml
# Drop files when switching tasks
/drop docs/antora.yml
Rule 3: Load reference files as read-only
# Read-only — model sees it but won't edit it (enables prompt caching)
/read docs/modules/ROOT/pages/templates/TEMPLATE-math-concept.adoc
# Then ask for work based on that reference
> Using the template I loaded, create a new page for logarithms
Rule 4: Review every diff
Since auto-commits are OFF, Aider shows diffs but doesn’t commit:
# 1. Aider makes edits (shown as diff)
# 2. Review the diff carefully
# 3. If good: commit manually
git diff
gach << 'EOF'
docs(scope): Your commit message
EOF
# 4. If bad: revert
git checkout -- path/to/file.adoc
Rule 5: Use /ask before /code
# Research mode — model reads but doesn't edit
/ask What attributes are defined in docs/antora.yml for ISE?
# Review the plan, then let it execute
/code Add a new practice problem to the math problem set
Rule 6: Keep context under 25K tokens
Above ~25K tokens, local models stop following instructions. Symptoms:
-
Ignores conventions
-
Produces wrong edit format
-
Hallucinates file paths
Fix: /drop files aggressively, use .aiderignore, keep /add list small.
Model Comparison (Your Hardware)
RTX 5090 Mobile (24GB VRAM) + 64GB RAM:
| Model | VRAM | Speed | Code Quality | Best For |
|---|---|---|---|---|
|
~18GB |
Fast (MoE, 3.3B active) |
Excellent |
Primary — newest, fast, smart |
|
~19GB |
Medium |
Excellent (71.4% Aider benchmark) |
Battle-tested alternative |
|
~9GB |
Fast |
Good (69.2%) |
Quick edits, small VRAM |
codestral:22b, deepseek-r1:14b, llava, and analyst were removed on 2026-03-30. Qwen models outperform them on all benchmarks for this hardware.
|
Limitations vs Claude Code
| Capability | Local Model | Workaround |
|---|---|---|
Multi-file refactoring |
Unreliable |
One file at a time, manual coordination |
Following complex rules |
Drifts after ~5 rules |
CONVENTIONS.md is distilled to essentials |
AsciiDoc attribute verification |
Will hallucinate attributes |
Always grep antora.yml yourself first |
Large context |
32K usable max |
Keep /add list small, /drop aggressively |
Commit messages |
Generic |
Write your own (auto-commit is OFF) |
Shell commands |
Dangerous |
Never trust generated shell commands |
Cross-repo awareness |
No |
Only knows files you /add |
Template adherence |
Partial |
Use /read to load template, architect mode helps |
Troubleshooting
Model producing garbage or ignoring instructions
Most likely cause: num_ctx too small. Verify your model settings are loaded:
# Check that .aider.model.settings.yml exists
cat ~/.aider.model.settings.yml
If missing, the model runs with 2048 token context and everything breaks silently.
Model too slow
# Check VRAM usage
nvidia-smi
# Switch to smaller model
aider --model ollama_chat/qwen2.5-coder:14b
Edit format errors
If Aider complains about malformed edits, the model is failing to produce valid output:
# Switch to whole format explicitly
/chat-mode code
Or restart with:
aider --model ollama_chat/qwen2.5-coder:32b --edit-format whole
Context overloaded
# Clear everything and start fresh
/clear
/drop *
/add only-the-file-you-need.adoc
Ollama not responding
# Check if Ollama is running
ss -tlnp | grep 11434
# Restart Ollama service
sudo systemctl restart ollama
Model not found
# List available models
ollama list
# Pull if missing (requires internet)
ollama pull qwen2.5-coder:32b
Custom Ollama Chat Models
Create purpose-built chat models with system prompts baked in — no CONVENTIONS.md needed for interactive use.
Creating a Modelfile
|
When using zsh heredocs, the delimiter must be quoted ( |
cat << 'EOF' > /tmp/domus-chat-v3.Modelfile
FROM qwen3-coder:30b
SYSTEM """You are a senior infrastructure and security engineer assistant. Follow these rules strictly:
- AsciiDoc only, never markdown
- NEVER add :toc: attributes (Antora UI handles navigation)
- Use stem:[...] for inline math, [stem] with ++++ for display blocks
- Use [%collapsible] for expandable sections
- No AI attribution ever
- Be direct, concise, no preamble
- Use AsciiDoc source blocks: [source,bash] then ---- delimiters
- Never use markdown backtick fences
- Use {attribute} references, never hardcode IPs or hostnames
- For code blocks with attributes, add subs=attributes+"""
PARAMETER num_ctx 32768
PARAMETER temperature 0.3
PARAMETER top_k 40
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
EOF
|
Do NOT add a |
ollama create domus-chat-v3 -f /tmp/domus-chat-v3.Modelfile
ollama run domus-chat-v3-v3
Modelfile Syntax Rules
| Rule | Detail |
|---|---|
Triple quotes |
Required for multi-line SYSTEM prompts. Opening and closing |
Quoted heredoc |
Critical in zsh. Unquoted heredocs expand |
Instructions are case-insensitive |
|
No escaping inside triple quotes |
Parser reads raw content until closing |
Changes require re-create |
Editing the Modelfile on disk does nothing. Must re-run |
Few-Shot Examples (Optional)
Pre-load examples to steer the model’s output format:
MESSAGE user "Create a collapsible section about SSH keys"
MESSAGE assistant "[%collapsible]
.SSH Key Generation
====
[source,bash]
ssh-keygen -t ed25519 -C \"user@host\"
==== "
Active Custom Models (as of 2026-03-30)
| Model Name | Base | Purpose |
|---|---|---|
|
|
Interactive chat with AsciiDoc + infrastructure conventions |
|
|
Ultra-fast Q&A, one paragraph max |
Modelfiles stored in ~/.ollama/Modelfiles/.
Capturing Sessions to File
Ollama’s interactive REPL does not write to files. Use these workarounds:
One-shot (pipe output)
ollama run domus-chat-v3 "explain btrfs snapshots" > ~/output.adoc
Interactive with capture (tee)
ollama run domus-chat-v3 | tee ~/session-output.adoc
Model output goes to both screen and file. Your typed prompts are not captured.
Full session recording (script)
script -q ~/session-raw.txt -c "ollama run domus-chat-v3"
Captures both sides (input + output). Clean up into AsciiDoc afterward, or ask the model as its last prompt:
Now take everything we discussed and output it as a single clean AsciiDoc document
Inspecting Existing Models
# See what a model shipped with (system prompt, parameters, template)
ollama show qwen2.5-coder:32b --modelfile
Removing Unused Models
# Check disk usage
ollama list
# Remove a model
ollama rm model-name