OpenCode: Provider Configurations
Six providers, each chosen for a specific role. Local models for free routine work. Cloud APIs for tasks that justify the cost.
Provider Architecture
OpenCode supports 75+ LLM providers through the Vercel AI SDK and Models.dev registry. This is the core differentiator — no vendor lock-in.
Provider Tiers
| Tier | Category | Providers |
|---|---|---|
Tier 1 |
Major cloud + configured locally |
Anthropic, OpenAI, Google Vertex/Gemini, GitHub Copilot, DeepSeek |
Tier 2 |
Aggregators |
OpenRouter (75+ models), Together AI, Groq, Hugging Face, Deep Infra |
Tier 3 |
Local inference |
Ollama, LM Studio, llama.cpp |
Tier 4 |
Specialized/regional |
Cloudflare AI, xAI, Cerebras, Fireworks AI, Nebius, Venice AI |
Our Provider Strategy
| Provider | Primary Model | Use Case | Cost |
|---|---|---|---|
Ollama (local) |
|
Routine docs, AsciiDoc editing, offline work |
Free (GPU power only) |
DeepSeek |
|
Code generation, programming tasks |
Low (fraction of Tier 1) |
GitHub Copilot |
|
Quick edits, completions (existing subscription) |
Included in Pro |
OpenAI |
|
Second opinion, reasoning tasks |
Per-token |
Google Gemini |
|
Large context analysis, code review |
Per-token |
Anthropic |
|
Complex refactoring, heavy reasoning |
Per-token |
Model Selection Logic
The right provider for the right job:
Task arrives
├── Is it simple? (edit, format, small change)
│ └── Ollama local → free, fast, offline
├── Is it code generation?
│ └── DeepSeek → strong coder, low cost
├── Is it large-context? (multi-file review, architecture)
│ └── Gemini 2.5 → 1M+ token window
├── Is it complex reasoning? (refactoring, design decisions)
│ └── Claude Sonnet/Opus → best reasoning
├── Need a second opinion?
│ └── GPT-4.1 → different training data
└── Quick completion?
└── GitHub Copilot → already subscribed
Adding Custom OpenAI-Compatible Providers
Any provider with an OpenAI-compatible API can be added:
{
"provider": {
"custom-provider": {
"id": "@ai-sdk/openai-compatible",
"api": {
"apiKey": "{env:CUSTOM_API_KEY}",
"baseURL": "https://api.custom-provider.com/v1"
},
"models": {
"custom-model": {
"id": "model-id-on-provider",
"name": "Custom Model Name"
}
}
}
}
}
Provider: Ollama (Local)
Why Ollama
-
Zero cost — Runs on RTX 5090 (24GB VRAM), no API charges
-
Offline capable — Works without internet, critical for air-gapped or travel scenarios
-
Privacy — All data stays local, no third-party data sharing
-
Low latency — No network round-trip, GPU-accelerated inference
-
Fine-tunable — Custom models via QLoRA (see local-model project)
Status: ACTIVE (2026-04-04)
Installed via curl -fsSL ollama.ai/install.sh | sh. Running as systemd service. RTX 5090 detected automatically. Currently the default model until cloud API keys are configured in dsec.
Configuration (opencode.jsonc)
{
"provider": {
"ollama-local": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (Local)",
"options": {
"baseURL": "http://localhost:11434/v1"
},
"models": {
"qwen3:14b": {
"name": "Qwen3 14B (Local)"
}
}
}
}
}
Critical: Context Window Tuning
Ollama defaults to a tiny context window. For coding agent use, increase it:
# Set context window when running model
ollama run qwen2.5-coder:14b --num-ctx 32768
Or create a Modelfile:
FROM qwen2.5-coder:14b
PARAMETER num_ctx 32768
PARAMETER temperature 0
ollama create qwen-coder-32k -f Modelfile
Without num_ctx tuning, models truncate context at 2048-4096 tokens — completely inadequate for coding agents. This is the #1 mistake with Ollama + OpenCode.
|
GPU Memory Usage
| Model | VRAM (32k ctx) | Speed (RTX 5090) |
|---|---|---|
qwen2.5-coder:7b |
~6 GB |
~80 tok/s |
qwen2.5-coder:14b |
~12 GB |
~45 tok/s |
deepseek-coder-v2:16b |
~14 GB |
~35 tok/s |
llama3.1:8b |
~7 GB |
~75 tok/s |
| RTX 5090 Mobile (24GB VRAM) can run 14B models comfortably with 32k context. Stack two models simultaneously for comparison workflows. |
Service Management
# Start Ollama (systemd)
systemctl --user start ollama
# Check status
systemctl --user status ollama
# Or via Docker Compose (GPU-accelerated)
docker compose -f ~/atelier/_projects/personal/ollama-local/docker-compose-gpu.yml up -d
Verification
# List installed models
curl -s http://localhost:11434/api/tags | jq '.models[].name'
# Test inference
curl -s http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen2.5-coder:14b","messages":[{"role":"user","content":"Hello"}]}' \
| jq '.choices[0].message.content'
Provider: DeepSeek
Why DeepSeek
-
Cost-efficient — Comparable coding quality to GPT-4 at a fraction of the price
-
Strong at code — Purpose-built for code generation and understanding
-
Reasoning models — DeepSeek-R1 offers extended thinking capabilities
-
Also available locally — Can run via Ollama for zero-cost usage
Cloud API Configuration
{
"provider": {
"deepseek": {
"id": "@ai-sdk/openai-compatible",
"api": {
"apiKey": "{env:DEEPSEEK_API_KEY}",
"baseURL": "https://api.deepseek.com/v1"
},
"models": {
"deepseek-chat": {
"id": "deepseek-chat",
"name": "DeepSeek Chat (V3)",
"context_length": 65536,
"temperature": 0
},
"deepseek-reasoner": {
"id": "deepseek-reasoner",
"name": "DeepSeek R1 (Reasoning)",
"reasoning": true,
"context_length": 65536,
"temperature": 0
}
}
}
}
}
Local via Ollama (Alternative)
DeepSeek Coder can also run locally through Ollama at zero cost:
ollama pull deepseek-coder-v2:16b
See the Ollama provider section for local configuration. Use cloud API when:
-
Context exceeds local VRAM capacity
-
Need the full DeepSeek-R1 reasoning model (67B+ params)
-
Local GPU is busy with other workloads
Environment Variable
# Add to ~/.zshenv or export in session
export DEEPSEEK_API_KEY="<your-key>"
Cost Estimate
| Model | Input | Output |
|---|---|---|
DeepSeek Chat (V3) |
$0.27/M tokens |
$1.10/M tokens |
DeepSeek R1 (Reasoning) |
$0.55/M tokens |
$2.19/M tokens |
For reference: A typical coding session (~50k tokens in, ~20k tokens out) costs approximately $0.04 with DeepSeek Chat. That is roughly 10-20x cheaper than Claude Sonnet or GPT-4.1.
Provider: GitHub Copilot
Why GitHub Copilot
-
Already subscribed — Copilot Pro subscription is paid, leverage it
-
Multiple models — Access to GPT-4.1, Claude, Gemini through one subscription
-
Device login — No API key management, browser-based auth
-
Fast completions — Optimized for code completion use cases
Configuration
{
"provider": {
"copilot": {
"id": "copilot",
"models": {
"gpt-4.1": {
"id": "gpt-4.1",
"name": "GPT-4.1 (via Copilot)"
},
"claude-sonnet": {
"id": "claude-sonnet-4-6",
"name": "Claude Sonnet 4.6 (via Copilot)"
},
"gemini-flash": {
"id": "gemini-2.5-flash",
"name": "Gemini 2.5 Flash (via Copilot)"
}
}
}
}
}
Authentication
GitHub Copilot uses device login flow (no API key):
# OpenCode prompts for device login on first use
# 1. Opens browser to https://github.com/login/device
# 2. Enter the code shown in terminal
# 3. Authorize OpenCode
# Token is cached for subsequent sessions
Subscription Tiers
| Tier | Models Available | Notes |
|---|---|---|
Copilot Free |
GPT-4o Mini, Claude Haiku |
Limited requests/month |
Copilot Pro ($10/mo) |
GPT-4.1, Claude Sonnet, Gemini Flash |
Generous limits |
Copilot Pro+ ($39/mo) |
All models including Opus, O3, Gemini Pro |
Unlimited premium requests |
| Some models require Copilot Pro+ tier. Check GitHub’s model catalog for current availability. |
Best Use Cases
-
Quick completions — Fast code generation from existing subscription
-
Model comparison — Access Claude, GPT, and Gemini from one provider
-
Budget management — Flat subscription fee, no per-token surprise bills
Provider: OpenAI (ChatGPT)
Why OpenAI
-
Second opinion — Different training data than Claude, catches different patterns
-
Reasoning models — O3, O4-mini for chain-of-thought tasks
-
Browser auth — ChatGPT Plus/Pro subscribers can auth without API key
-
Proven ecosystem — Widest tool/library support
API Configuration
{
"provider": {
"openai": {
"id": "openai",
"api": {
"apiKey": "{env:OPENAI_API_KEY}"
},
"models": {
"gpt-4.1": {
"id": "gpt-4.1",
"name": "GPT-4.1",
"context_length": 1000000,
"temperature": 0
},
"gpt-4.1-mini": {
"id": "gpt-4.1-mini",
"name": "GPT-4.1 Mini (Fast)",
"context_length": 1000000,
"temperature": 0
},
"o4-mini": {
"id": "o4-mini",
"name": "O4 Mini (Reasoning)",
"reasoning": true,
"context_length": 200000,
"temperature": 0
}
}
}
}
}
Browser Authentication (Alternative)
ChatGPT Plus/Pro subscribers can authenticate via browser session instead of API key:
{
"provider": {
"openai": {
"id": "openai",
"api": {
"apiKey": "browser"
}
}
}
}
| Browser auth uses your subscription quota. API key uses pay-per-token billing. |
Environment Variable
export OPENAI_API_KEY="<your-key>"
Cost Estimate
| Model | Input | Output |
|---|---|---|
GPT-4.1 |
$2.00/M tokens |
$8.00/M tokens |
GPT-4.1 Mini |
$0.40/M tokens |
$1.60/M tokens |
O4 Mini |
$1.10/M tokens |
$4.40/M tokens |
Best Use Cases
-
Second opinion — Cross-check Claude’s recommendations on architecture decisions
-
Reasoning tasks — O4-mini for step-by-step problem solving
-
Long context — GPT-4.1 handles 1M token windows
Provider: Google Gemini
Why Gemini
-
Massive context window — 1M+ tokens, best for multi-file analysis
-
Free tier — Google AI Studio offers generous free usage
-
Code review — Strong at understanding relationships across large codebases
-
Vertex AI — Enterprise-grade alternative with GCP integration
Google AI Studio Configuration (Recommended)
{
"provider": {
"google": {
"id": "google",
"api": {
"apiKey": "{env:GOOGLE_GENERATIVE_AI_API_KEY}"
},
"models": {
"gemini-2.5-pro": {
"id": "gemini-2.5-pro-preview-06-05",
"name": "Gemini 2.5 Pro (Preview)",
"reasoning": true,
"context_length": 1048576,
"temperature": 0
},
"gemini-2.5-flash": {
"id": "gemini-2.5-flash-preview-05-20",
"name": "Gemini 2.5 Flash (Fast)",
"context_length": 1048576,
"temperature": 0
}
}
}
}
}
Vertex AI Configuration (Alternative)
For GCP-integrated enterprise usage:
{
"provider": {
"vertex": {
"id": "vertex",
"api": {
"project": "{env:GCLOUD_PROJECT}",
"location": "us-central1"
},
"models": {
"gemini-2.5-pro": {
"id": "gemini-2.5-pro-preview-06-05",
"name": "Gemini 2.5 Pro (Vertex)"
}
}
}
}
}
Requires gcloud auth application-default login or service account credentials.
Environment Variables
# Google AI Studio
export GOOGLE_GENERATIVE_AI_API_KEY="<your-key>"
# Vertex AI (alternative)
export GCLOUD_PROJECT="your-project-id"
Cost Estimate
| Model | Input | Output |
|---|---|---|
Gemini 2.5 Pro |
$1.25/M tokens (⇐200k), $2.50/M (>200k) |
$10.00/M tokens |
Gemini 2.5 Flash |
$0.15/M tokens (⇐200k), $0.30/M (>200k) |
$0.60/M tokens |
| Gemini 2.5 Flash is exceptionally cheap for its capability. Use it as the default "cloud" model for cost-sensitive tasks that exceed Ollama’s capacity. |
Best Use Cases
-
Multi-file code review — Feed entire repo structures into 1M context window
-
Architecture analysis — Understand cross-file relationships
-
Cost-effective cloud — Gemini Flash at $0.15/M input is almost free
-
Long document processing — Analyze full Antora component sources at once
Provider: Anthropic (Claude)
Why Claude via OpenCode
-
Best reasoning — Claude Opus/Sonnet for complex refactoring and architecture decisions
-
Known quantity — Extensive experience from Claude Code, understand its strengths and weaknesses
-
Pay-per-token — Use API for targeted heavy tasks, not flat subscription for everything
-
Browser auth — Claude Pro/Max subscribers can auth without API key
API Configuration
{
"provider": {
"anthropic": {
"id": "anthropic",
"api": {
"apiKey": "{env:ANTHROPIC_API_KEY}"
},
"models": {
"claude-opus": {
"id": "claude-opus-4-6",
"name": "Claude Opus 4.6",
"reasoning": true,
"attachment": true,
"context_length": 200000,
"temperature": 0
},
"claude-sonnet": {
"id": "claude-sonnet-4-6",
"name": "Claude Sonnet 4.6",
"reasoning": true,
"attachment": true,
"context_length": 200000,
"temperature": 0
},
"claude-haiku": {
"id": "claude-haiku-4-5-20251001",
"name": "Claude Haiku 4.5",
"attachment": true,
"context_length": 200000,
"temperature": 0
}
}
}
}
}
Browser Authentication (Alternative)
Claude Pro/Max subscribers can authenticate via browser session:
{
"provider": {
"anthropic": {
"id": "anthropic",
"api": {
"apiKey": "browser"
}
}
}
}
| Browser auth uses your subscription quota. API key uses pay-per-token billing. If you have Claude Max, browser auth is more cost-effective for heavy usage. |
Environment Variable
export ANTHROPIC_API_KEY="<your-key>"
Cost Estimate
| Model | Input | Output |
|---|---|---|
Claude Opus 4.6 |
$15.00/M tokens |
$75.00/M tokens |
Claude Sonnet 4.6 |
$3.00/M tokens |
$15.00/M tokens |
Claude Haiku 4.5 |
$0.80/M tokens |
$4.00/M tokens |
Usage Strategy
Claude is the premium tier. Use it surgically:
| Model | When to Use |
|---|---|
Opus 4.6 |
Architecture decisions, complex multi-file refactoring, ambiguous requirements that need deep reasoning. The "phone a friend" option. |
Sonnet 4.6 |
Standard coding tasks that exceed local model capability. Good balance of quality and cost. |
Haiku 4.5 |
Fast linting, simple edits, read-only audits. Use instead of Sonnet when quality is not the bottleneck. |
Migration Note
Moving from Claude Code (proprietary) to Claude via OpenCode (open client) does NOT reduce model quality. The same API, same models, same reasoning. What changes:
-
Client — Open-source TUI instead of proprietary CLI
-
Flexibility — Switch to DeepSeek/Ollama for cost-sensitive tasks mid-session
-
Skills/hooks — Different plugin system (JS/TS vs bash hooks), but
.claude/skills/is cross-compatible
Model Configuration
Default Model
{
"model": "anthropic/claude-sonnet-4-6",
"small_model": "anthropic/claude-haiku-4-5"
}
| Key | Purpose |
|---|---|
|
Primary model for all agents (unless overridden per agent/mode) |
|
Lightweight model for automatic tasks (title generation, summaries, compaction) |
Model Loading Priority
-
Command-line flags (
--modelor-m) -
Config file
modelsetting -
Last used model (remembered from previous session)
-
First model by internal priority
Model Variants (Reasoning Effort)
Variants let you toggle reasoning effort for the same model without switching models.
Built-in Variants
| Provider | Available Variants |
|---|---|
Anthropic |
|
OpenAI |
|
|
Cycle Variants
Default keybind: Ctrl+T (variant_cycle)
This toggles between reasoning effort levels. On Anthropic: high → max → high. On OpenAI: cycles through all 6 levels.
Custom Variants
{
"provider": {
"openai": {
"models": {
"gpt-4.1": {
"variants": {
"high": {
"reasoningEffort": "high",
"textVerbosity": "low",
"reasoningSummary": "auto"
},
"low": {
"reasoningEffort": "low",
"textVerbosity": "low",
"reasoningSummary": "auto"
}
}
}
}
}
}
}
Disable a Variant
{
"provider": {
"anthropic": {
"models": {
"claude-sonnet-4-6": {
"variants": {
"max": {
"disabled": true
}
}
}
}
}
}
}
Anthropic Extended Thinking
{
"provider": {
"anthropic": {
"models": {
"claude-sonnet-4-6": {
"options": {
"thinking": {
"type": "enabled",
"budgetTokens": 16000
}
}
}
}
}
}
}
Per-Agent Model Override
Each agent can use a different model:
{
"agent": {
"build": {
"model": "anthropic/claude-sonnet-4-6"
},
"plan": {
"model": "ollama/qwen-coder-14b"
}
}
}
Per-Mode Model Override
Each mode can use a different model:
{
"mode": {
"build": {
"model": "anthropic/claude-sonnet-4-6"
},
"plan": {
"model": "ollama/qwen-coder-14b"
}
}
}
Our Model Strategy
| Context | Model | Rationale |
|---|---|---|
Build mode (default) |
|
Free local, fast, adequate for most edits |
Plan mode |
|
Read-only analysis, local is fine |
Complex tasks |
|
Switch via Ctrl+X → M when quality matters |
Quick summaries |
|
Fast local for titles, compaction |
Cost-check reasoning |
Variant cycle ( |
Toggle reasoning effort without switching models |