OpenCode: Provider Configurations

Six providers, each chosen for a specific role. Local models for free routine work. Cloud APIs for tasks that justify the cost.

Provider Architecture

Figure 1. Provider Selection Flow

OpenCode supports 75+ LLM providers through the Vercel AI SDK and Models.dev registry. This is the core differentiator — no vendor lock-in.

Provider Tiers

Tier	Category	Providers
Tier 1	Major cloud + configured locally	Anthropic, OpenAI, Google Vertex/Gemini, GitHub Copilot, DeepSeek
Tier 2	Aggregators	OpenRouter (75+ models), Together AI, Groq, Hugging Face, Deep Infra
Tier 3	Local inference	Ollama, LM Studio, llama.cpp
Tier 4	Specialized/regional	Cloudflare AI, xAI, Cerebras, Fireworks AI, Nebius, Venice AI

Tier

Our Provider Strategy

Provider Primary Model Use Case Cost

Provider	Primary Model	Use Case	Cost
Ollama (local)	`qwen2.5-coder:14b`	Routine docs, AsciiDoc editing, offline work	Free (GPU power only)
DeepSeek	`deepseek-coder`	Code generation, programming tasks	Low (fraction of Tier 1)
GitHub Copilot	`gpt-4.1`	Quick edits, completions (existing subscription)	Included in Pro
OpenAI	`gpt-4.1` / `o4-mini`	Second opinion, reasoning tasks	Per-token
Google Gemini	`gemini-2.5-pro`	Large context analysis, code review	Per-token
Anthropic	`claude-sonnet-4-6`	Complex refactoring, heavy reasoning	Per-token

Ollama (local)

qwen2.5-coder:14b

Routine docs, AsciiDoc editing, offline work

Free (GPU power only)

DeepSeek

deepseek-coder

Code generation, programming tasks

Low (fraction of Tier 1)

GitHub Copilot

gpt-4.1

Quick edits, completions (existing subscription)

Included in Pro

OpenAI

gpt-4.1 / o4-mini

Second opinion, reasoning tasks

Per-token

Google Gemini

gemini-2.5-pro

Large context analysis, code review

Per-token

Anthropic

claude-sonnet-4-6

Complex refactoring, heavy reasoning

Per-token

Model Selection Logic

The right provider for the right job:

Task arrives
├── Is it simple? (edit, format, small change)
│   └── Ollama local → free, fast, offline
├── Is it code generation?
│   └── DeepSeek → strong coder, low cost
├── Is it large-context? (multi-file review, architecture)
│   └── Gemini 2.5 → 1M+ token window
├── Is it complex reasoning? (refactoring, design decisions)
│   └── Claude Sonnet/Opus → best reasoning
├── Need a second opinion?
│   └── GPT-4.1 → different training data
└── Quick completion?
    └── GitHub Copilot → already subscribed

Adding Custom OpenAI-Compatible Providers

Any provider with an OpenAI-compatible API can be added:

{
  "provider": {
    "custom-provider": {
      "id": "@ai-sdk/openai-compatible",
      "api": {
        "apiKey": "{env:CUSTOM_API_KEY}",
        "baseURL": "https://api.custom-provider.com/v1"
      },
      "models": {
        "custom-model": {
          "id": "model-id-on-provider",
          "name": "Custom Model Name"
        }
      }
    }
  }
}

Provider: Ollama (Local)

Why Ollama

Zero cost — Runs on RTX 5090 (24GB VRAM), no API charges
Offline capable — Works without internet, critical for air-gapped or travel scenarios
Privacy — All data stays local, no third-party data sharing
Low latency — No network round-trip, GPU-accelerated inference
Fine-tunable — Custom models via QLoRA (see local-model project)

Status: ACTIVE (2026-04-04)

Installed via curl -fsSL ollama.ai/install.sh | sh. Running as systemd service. RTX 5090 detected automatically. Currently the default model until cloud API keys are configured in dsec.

Configuration (opencode.jsonc)

{
  "provider": {
    "ollama-local": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (Local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen3:14b": {
          "name": "Qwen3 14B (Local)"
        }
      }
    }
  }
}

Critical: Context Window Tuning

Ollama defaults to a tiny context window. For coding agent use, increase it:

# Set context window when running model
ollama run qwen2.5-coder:14b --num-ctx 32768

Or create a Modelfile:

FROM qwen2.5-coder:14b
PARAMETER num_ctx 32768
PARAMETER temperature 0

ollama create qwen-coder-32k -f Modelfile

Without num_ctx tuning, models truncate context at 2048-4096 tokens — completely inadequate for coding agents. This is the #1 mistake with Ollama + OpenCode.

GPU Memory Usage

Model	VRAM (32k ctx)	Speed (RTX 5090)
qwen2.5-coder:7b	~6 GB	~80 tok/s
qwen2.5-coder:14b	~12 GB	~45 tok/s
deepseek-coder-v2:16b	~14 GB	~35 tok/s
llama3.1:8b	~7 GB	~75 tok/s

Model

VRAM (32k ctx)

Speed (RTX 5090)

qwen2.5-coder:7b

~6 GB

~80 tok/s

qwen2.5-coder:14b

~12 GB

~45 tok/s

deepseek-coder-v2:16b

~14 GB

~35 tok/s

llama3.1:8b

~7 GB

~75 tok/s

RTX 5090 Mobile (24GB VRAM) can run 14B models comfortably with 32k context. Stack two models simultaneously for comparison workflows.

Service Management

# Start Ollama (systemd)
systemctl --user start ollama

# Check status
systemctl --user status ollama

# Or via Docker Compose (GPU-accelerated)
docker compose -f ~/atelier/_projects/personal/ollama-local/docker-compose-gpu.yml up -d

Verification

# List installed models
curl -s http://localhost:11434/api/tags | jq '.models[].name'

# Test inference
curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5-coder:14b","messages":[{"role":"user","content":"Hello"}]}' \
  | jq '.choices[0].message.content'

Provider: DeepSeek

Why DeepSeek

Cost-efficient — Comparable coding quality to GPT-4 at a fraction of the price
Strong at code — Purpose-built for code generation and understanding
Reasoning models — DeepSeek-R1 offers extended thinking capabilities
Also available locally — Can run via Ollama for zero-cost usage

Cloud API Configuration

{
  "provider": {
    "deepseek": {
      "id": "@ai-sdk/openai-compatible",
      "api": {
        "apiKey": "{env:DEEPSEEK_API_KEY}",
        "baseURL": "https://api.deepseek.com/v1"
      },
      "models": {
        "deepseek-chat": {
          "id": "deepseek-chat",
          "name": "DeepSeek Chat (V3)",
          "context_length": 65536,
          "temperature": 0
        },
        "deepseek-reasoner": {
          "id": "deepseek-reasoner",
          "name": "DeepSeek R1 (Reasoning)",
          "reasoning": true,
          "context_length": 65536,
          "temperature": 0
        }
      }
    }
  }
}

Local via Ollama (Alternative)

DeepSeek Coder can also run locally through Ollama at zero cost:

ollama pull deepseek-coder-v2:16b

See the Ollama provider section for local configuration. Use cloud API when:

Context exceeds local VRAM capacity
Need the full DeepSeek-R1 reasoning model (67B+ params)
Local GPU is busy with other workloads

Environment Variable

# Add to ~/.zshenv or export in session
export DEEPSEEK_API_KEY="<your-key>"

Cost Estimate

Model	Input	Output
DeepSeek Chat (V3)	$0.27/M tokens	$1.10/M tokens
DeepSeek R1 (Reasoning)	$0.55/M tokens	$2.19/M tokens

Model

Input

Output

DeepSeek Chat (V3)

$0.27/M tokens

$1.10/M tokens

DeepSeek R1 (Reasoning)

$0.55/M tokens

$2.19/M tokens

For reference: A typical coding session (~50k tokens in, ~20k tokens out) costs approximately $0.04 with DeepSeek Chat. That is roughly 10-20x cheaper than Claude Sonnet or GPT-4.1.

Provider: GitHub Copilot

Why GitHub Copilot

Already subscribed — Copilot Pro subscription is paid, leverage it
Multiple models — Access to GPT-4.1, Claude, Gemini through one subscription
Device login — No API key management, browser-based auth
Fast completions — Optimized for code completion use cases

Configuration

{
  "provider": {
    "copilot": {
      "id": "copilot",
      "models": {
        "gpt-4.1": {
          "id": "gpt-4.1",
          "name": "GPT-4.1 (via Copilot)"
        },
        "claude-sonnet": {
          "id": "claude-sonnet-4-6",
          "name": "Claude Sonnet 4.6 (via Copilot)"
        },
        "gemini-flash": {
          "id": "gemini-2.5-flash",
          "name": "Gemini 2.5 Flash (via Copilot)"
        }
      }
    }
  }
}

Authentication

GitHub Copilot uses device login flow (no API key):

# OpenCode prompts for device login on first use
# 1. Opens browser to https://github.com/login/device
# 2. Enter the code shown in terminal
# 3. Authorize OpenCode
# Token is cached for subsequent sessions

Subscription Tiers

Tier	Models Available	Notes
Copilot Free	GPT-4o Mini, Claude Haiku	Limited requests/month
Copilot Pro ($10/mo)	GPT-4.1, Claude Sonnet, Gemini Flash	Generous limits
Copilot Pro+ ($39/mo)	All models including Opus, O3, Gemini Pro	Unlimited premium requests

Tier

Models Available

Notes

Copilot Free

GPT-4o Mini, Claude Haiku

Limited requests/month

Copilot Pro ($10/mo)

GPT-4.1, Claude Sonnet, Gemini Flash

Generous limits

Copilot Pro+ ($39/mo)

All models including Opus, O3, Gemini Pro

Unlimited premium requests

Some models require Copilot Pro+ tier. Check GitHub’s model catalog for current availability.

Best Use Cases

Quick completions — Fast code generation from existing subscription
Model comparison — Access Claude, GPT, and Gemini from one provider
Budget management — Flat subscription fee, no per-token surprise bills

Provider: OpenAI (ChatGPT)

Why OpenAI

Second opinion — Different training data than Claude, catches different patterns
Reasoning models — O3, O4-mini for chain-of-thought tasks
Browser auth — ChatGPT Plus/Pro subscribers can auth without API key
Proven ecosystem — Widest tool/library support

API Configuration

{
  "provider": {
    "openai": {
      "id": "openai",
      "api": {
        "apiKey": "{env:OPENAI_API_KEY}"
      },
      "models": {
        "gpt-4.1": {
          "id": "gpt-4.1",
          "name": "GPT-4.1",
          "context_length": 1000000,
          "temperature": 0
        },
        "gpt-4.1-mini": {
          "id": "gpt-4.1-mini",
          "name": "GPT-4.1 Mini (Fast)",
          "context_length": 1000000,
          "temperature": 0
        },
        "o4-mini": {
          "id": "o4-mini",
          "name": "O4 Mini (Reasoning)",
          "reasoning": true,
          "context_length": 200000,
          "temperature": 0
        }
      }
    }
  }
}

Browser Authentication (Alternative)

ChatGPT Plus/Pro subscribers can authenticate via browser session instead of API key:

{
  "provider": {
    "openai": {
      "id": "openai",
      "api": {
        "apiKey": "browser"
      }
    }
  }
}

Browser auth uses your subscription quota. API key uses pay-per-token billing.

Environment Variable

export OPENAI_API_KEY="<your-key>"

Cost Estimate

Model	Input	Output
GPT-4.1	$2.00/M tokens	$8.00/M tokens
GPT-4.1 Mini	$0.40/M tokens	$1.60/M tokens
O4 Mini	$1.10/M tokens	$4.40/M tokens

Model

Input

Output

GPT-4.1

$2.00/M tokens

$8.00/M tokens

GPT-4.1 Mini

$0.40/M tokens

$1.60/M tokens

O4 Mini

$1.10/M tokens

$4.40/M tokens

Best Use Cases

Second opinion — Cross-check Claude’s recommendations on architecture decisions
Reasoning tasks — O4-mini for step-by-step problem solving
Long context — GPT-4.1 handles 1M token windows

Provider: Google Gemini

Why Gemini

Massive context window — 1M+ tokens, best for multi-file analysis
Free tier — Google AI Studio offers generous free usage
Code review — Strong at understanding relationships across large codebases
Vertex AI — Enterprise-grade alternative with GCP integration

Google AI Studio Configuration (Recommended)

{
  "provider": {
    "google": {
      "id": "google",
      "api": {
        "apiKey": "{env:GOOGLE_GENERATIVE_AI_API_KEY}"
      },
      "models": {
        "gemini-2.5-pro": {
          "id": "gemini-2.5-pro-preview-06-05",
          "name": "Gemini 2.5 Pro (Preview)",
          "reasoning": true,
          "context_length": 1048576,
          "temperature": 0
        },
        "gemini-2.5-flash": {
          "id": "gemini-2.5-flash-preview-05-20",
          "name": "Gemini 2.5 Flash (Fast)",
          "context_length": 1048576,
          "temperature": 0
        }
      }
    }
  }
}

Vertex AI Configuration (Alternative)

For GCP-integrated enterprise usage:

{
  "provider": {
    "vertex": {
      "id": "vertex",
      "api": {
        "project": "{env:GCLOUD_PROJECT}",
        "location": "us-central1"
      },
      "models": {
        "gemini-2.5-pro": {
          "id": "gemini-2.5-pro-preview-06-05",
          "name": "Gemini 2.5 Pro (Vertex)"
        }
      }
    }
  }
}

Requires gcloud auth application-default login or service account credentials.

Environment Variables

# Google AI Studio
export GOOGLE_GENERATIVE_AI_API_KEY="<your-key>"

# Vertex AI (alternative)
export GCLOUD_PROJECT="your-project-id"

Cost Estimate

Model	Input	Output
Gemini 2.5 Pro	$1.25/M tokens (⇐200k), $2.50/M (>200k)	$10.00/M tokens
Gemini 2.5 Flash	$0.15/M tokens (⇐200k), $0.30/M (>200k)	$0.60/M tokens

Model

Input

Output

Gemini 2.5 Pro

$1.25/M tokens (⇐200k), $2.50/M (>200k)

$10.00/M tokens

Gemini 2.5 Flash

$0.15/M tokens (⇐200k), $0.30/M (>200k)

$0.60/M tokens

Gemini 2.5 Flash is exceptionally cheap for its capability. Use it as the default "cloud" model for cost-sensitive tasks that exceed Ollama’s capacity.

Best Use Cases

Multi-file code review — Feed entire repo structures into 1M context window
Architecture analysis — Understand cross-file relationships
Cost-effective cloud — Gemini Flash at $0.15/M input is almost free
Long document processing — Analyze full Antora component sources at once

Provider: Anthropic (Claude)

Why Claude via OpenCode

Best reasoning — Claude Opus/Sonnet for complex refactoring and architecture decisions
Known quantity — Extensive experience from Claude Code, understand its strengths and weaknesses
Pay-per-token — Use API for targeted heavy tasks, not flat subscription for everything
Browser auth — Claude Pro/Max subscribers can auth without API key

API Configuration

{
  "provider": {
    "anthropic": {
      "id": "anthropic",
      "api": {
        "apiKey": "{env:ANTHROPIC_API_KEY}"
      },
      "models": {
        "claude-opus": {
          "id": "claude-opus-4-6",
          "name": "Claude Opus 4.6",
          "reasoning": true,
          "attachment": true,
          "context_length": 200000,
          "temperature": 0
        },
        "claude-sonnet": {
          "id": "claude-sonnet-4-6",
          "name": "Claude Sonnet 4.6",
          "reasoning": true,
          "attachment": true,
          "context_length": 200000,
          "temperature": 0
        },
        "claude-haiku": {
          "id": "claude-haiku-4-5-20251001",
          "name": "Claude Haiku 4.5",
          "attachment": true,
          "context_length": 200000,
          "temperature": 0
        }
      }
    }
  }
}

Browser Authentication (Alternative)

Claude Pro/Max subscribers can authenticate via browser session:

{
  "provider": {
    "anthropic": {
      "id": "anthropic",
      "api": {
        "apiKey": "browser"
      }
    }
  }
}

Browser auth uses your subscription quota. API key uses pay-per-token billing. If you have Claude Max, browser auth is more cost-effective for heavy usage.

Environment Variable

export ANTHROPIC_API_KEY="<your-key>"

Cost Estimate

Model	Input	Output
Claude Opus 4.6	$15.00/M tokens	$75.00/M tokens
Claude Sonnet 4.6	$3.00/M tokens	$15.00/M tokens
Claude Haiku 4.5	$0.80/M tokens	$4.00/M tokens

Model

Input

Output

Claude Opus 4.6

$15.00/M tokens

$75.00/M tokens

Claude Sonnet 4.6

$3.00/M tokens

$15.00/M tokens

Claude Haiku 4.5

$0.80/M tokens

$4.00/M tokens

Usage Strategy

Claude is the premium tier. Use it surgically:

Model	When to Use
Opus 4.6	Architecture decisions, complex multi-file refactoring, ambiguous requirements that need deep reasoning. The "phone a friend" option.
Sonnet 4.6	Standard coding tasks that exceed local model capability. Good balance of quality and cost.
Haiku 4.5	Fast linting, simple edits, read-only audits. Use instead of Sonnet when quality is not the bottleneck.

Model

When to Use

Opus 4.6

Architecture decisions, complex multi-file refactoring, ambiguous requirements that need deep reasoning. The "phone a friend" option.

Sonnet 4.6

Standard coding tasks that exceed local model capability. Good balance of quality and cost.

Haiku 4.5

Fast linting, simple edits, read-only audits. Use instead of Sonnet when quality is not the bottleneck.

Migration Note

Moving from Claude Code (proprietary) to Claude via OpenCode (open client) does NOT reduce model quality. The same API, same models, same reasoning. What changes:

Client — Open-source TUI instead of proprietary CLI
Flexibility — Switch to DeepSeek/Ollama for cost-sensitive tasks mid-session
Skills/hooks — Different plugin system (JS/TS vs bash hooks), but .claude/skills/ is cross-compatible

Model Configuration

Default Model

{
  "model": "anthropic/claude-sonnet-4-6",
  "small_model": "anthropic/claude-haiku-4-5"
}

Key Purpose

Key	Purpose
`model`	Primary model for all agents (unless overridden per agent/mode)
`small_model`	Lightweight model for automatic tasks (title generation, summaries, compaction)

model

Primary model for all agents (unless overridden per agent/mode)

small_model

Lightweight model for automatic tasks (title generation, summaries, compaction)

Model Loading Priority

Command-line flags (--model or -m)
Config file model setting
Last used model (remembered from previous session)
First model by internal priority

Model Variants (Reasoning Effort)

Variants let you toggle reasoning effort for the same model without switching models.

Built-in Variants

Provider Available Variants

Provider	Available Variants
Anthropic	`high` (default), `max`
OpenAI	`none`, `minimal`, `low`, `medium`, `high`, `xhigh`
Google	`low`, `high`

Anthropic

high (default), max

OpenAI

none, minimal, low, medium, high, xhigh

Google

low, high

Cycle Variants

Default keybind: Ctrl+T (variant_cycle)

This toggles between reasoning effort levels. On Anthropic: high → max → high. On OpenAI: cycles through all 6 levels.

Custom Variants

{
  "provider": {
    "openai": {
      "models": {
        "gpt-4.1": {
          "variants": {
            "high": {
              "reasoningEffort": "high",
              "textVerbosity": "low",
              "reasoningSummary": "auto"
            },
            "low": {
              "reasoningEffort": "low",
              "textVerbosity": "low",
              "reasoningSummary": "auto"
            }
          }
        }
      }
    }
  }
}

Disable a Variant

{
  "provider": {
    "anthropic": {
      "models": {
        "claude-sonnet-4-6": {
          "variants": {
            "max": {
              "disabled": true
            }
          }
        }
      }
    }
  }
}

Anthropic Extended Thinking

{
  "provider": {
    "anthropic": {
      "models": {
        "claude-sonnet-4-6": {
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 16000
            }
          }
        }
      }
    }
  }
}

Per-Agent Model Override

Each agent can use a different model:

{
  "agent": {
    "build": {
      "model": "anthropic/claude-sonnet-4-6"
    },
    "plan": {
      "model": "ollama/qwen-coder-14b"
    }
  }
}

Per-Mode Model Override

Each mode can use a different model:

{
  "mode": {
    "build": {
      "model": "anthropic/claude-sonnet-4-6"
    },
    "plan": {
      "model": "ollama/qwen-coder-14b"
    }
  }
}

Our Model Strategy

Context Model Rationale

Context	Model	Rationale
Build mode (default)	`ollama/qwen-coder-14b`	Free local, fast, adequate for most edits
Plan mode	`ollama/qwen-coder-14b`	Read-only analysis, local is fine
Complex tasks	`anthropic/claude-sonnet-4-6`	Switch via Ctrl+X → M when quality matters
Quick summaries	`ollama/llama3.1:8b` (small_model)	Fast local for titles, compaction
Cost-check reasoning	Variant cycle (`Ctrl+T`)	Toggle reasoning effort without switching models

Build mode (default)

ollama/qwen-coder-14b

Free local, fast, adequate for most edits

Plan mode

ollama/qwen-coder-14b

Read-only analysis, local is fine

Complex tasks

anthropic/claude-sonnet-4-6

Switch via Ctrl+X → M when quality matters

Quick summaries

ollama/llama3.1:8b (small_model)

Fast local for titles, compaction

Cost-check reasoning

Variant cycle (Ctrl+T)

Toggle reasoning effort without switching models