OpenCode: Provider Configurations

Six providers, each chosen for a specific role. Local models for free routine work. Cloud APIs for tasks that justify the cost.

Provider Architecture

Provider Selection
Figure 1. Provider Selection Flow

OpenCode supports 75+ LLM providers through the Vercel AI SDK and Models.dev registry. This is the core differentiator — no vendor lock-in.

Provider Tiers

Tier Category Providers

Tier 1

Major cloud + configured locally

Anthropic, OpenAI, Google Vertex/Gemini, GitHub Copilot, DeepSeek

Tier 2

Aggregators

OpenRouter (75+ models), Together AI, Groq, Hugging Face, Deep Infra

Tier 3

Local inference

Ollama, LM Studio, llama.cpp

Tier 4

Specialized/regional

Cloudflare AI, xAI, Cerebras, Fireworks AI, Nebius, Venice AI

Our Provider Strategy

Provider Primary Model Use Case Cost

Ollama (local)

qwen2.5-coder:14b

Routine docs, AsciiDoc editing, offline work

Free (GPU power only)

DeepSeek

deepseek-coder

Code generation, programming tasks

Low (fraction of Tier 1)

GitHub Copilot

gpt-4.1

Quick edits, completions (existing subscription)

Included in Pro

OpenAI

gpt-4.1 / o4-mini

Second opinion, reasoning tasks

Per-token

Google Gemini

gemini-2.5-pro

Large context analysis, code review

Per-token

Anthropic

claude-sonnet-4-6

Complex refactoring, heavy reasoning

Per-token

Model Selection Logic

The right provider for the right job:

Task arrives
├── Is it simple? (edit, format, small change)
│   └── Ollama local → free, fast, offline
├── Is it code generation?
│   └── DeepSeek → strong coder, low cost
├── Is it large-context? (multi-file review, architecture)
│   └── Gemini 2.5 → 1M+ token window
├── Is it complex reasoning? (refactoring, design decisions)
│   └── Claude Sonnet/Opus → best reasoning
├── Need a second opinion?
│   └── GPT-4.1 → different training data
└── Quick completion?
    └── GitHub Copilot → already subscribed

Adding Custom OpenAI-Compatible Providers

Any provider with an OpenAI-compatible API can be added:

{
  "provider": {
    "custom-provider": {
      "id": "@ai-sdk/openai-compatible",
      "api": {
        "apiKey": "{env:CUSTOM_API_KEY}",
        "baseURL": "https://api.custom-provider.com/v1"
      },
      "models": {
        "custom-model": {
          "id": "model-id-on-provider",
          "name": "Custom Model Name"
        }
      }
    }
  }
}

Provider: Ollama (Local)

Why Ollama

  • Zero cost — Runs on RTX 5090 (24GB VRAM), no API charges

  • Offline capable — Works without internet, critical for air-gapped or travel scenarios

  • Privacy — All data stays local, no third-party data sharing

  • Low latency — No network round-trip, GPU-accelerated inference

  • Fine-tunable — Custom models via QLoRA (see local-model project)

Status: ACTIVE (2026-04-04)

Installed via curl -fsSL ollama.ai/install.sh | sh. Running as systemd service. RTX 5090 detected automatically. Currently the default model until cloud API keys are configured in dsec.

Configuration (opencode.jsonc)

{
  "provider": {
    "ollama-local": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (Local)",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "qwen3:14b": {
          "name": "Qwen3 14B (Local)"
        }
      }
    }
  }
}

Critical: Context Window Tuning

Ollama defaults to a tiny context window. For coding agent use, increase it:

# Set context window when running model
ollama run qwen2.5-coder:14b --num-ctx 32768

Or create a Modelfile:

FROM qwen2.5-coder:14b
PARAMETER num_ctx 32768
PARAMETER temperature 0
ollama create qwen-coder-32k -f Modelfile
Without num_ctx tuning, models truncate context at 2048-4096 tokens — completely inadequate for coding agents. This is the #1 mistake with Ollama + OpenCode.

GPU Memory Usage

Model VRAM (32k ctx) Speed (RTX 5090)

qwen2.5-coder:7b

~6 GB

~80 tok/s

qwen2.5-coder:14b

~12 GB

~45 tok/s

deepseek-coder-v2:16b

~14 GB

~35 tok/s

llama3.1:8b

~7 GB

~75 tok/s

RTX 5090 Mobile (24GB VRAM) can run 14B models comfortably with 32k context. Stack two models simultaneously for comparison workflows.

Service Management

# Start Ollama (systemd)
systemctl --user start ollama

# Check status
systemctl --user status ollama

# Or via Docker Compose (GPU-accelerated)
docker compose -f ~/atelier/_projects/personal/ollama-local/docker-compose-gpu.yml up -d

Verification

# List installed models
curl -s http://localhost:11434/api/tags | jq '.models[].name'

# Test inference
curl -s http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5-coder:14b","messages":[{"role":"user","content":"Hello"}]}' \
  | jq '.choices[0].message.content'

Provider: DeepSeek

Why DeepSeek

  • Cost-efficient — Comparable coding quality to GPT-4 at a fraction of the price

  • Strong at code — Purpose-built for code generation and understanding

  • Reasoning models — DeepSeek-R1 offers extended thinking capabilities

  • Also available locally — Can run via Ollama for zero-cost usage

Cloud API Configuration

{
  "provider": {
    "deepseek": {
      "id": "@ai-sdk/openai-compatible",
      "api": {
        "apiKey": "{env:DEEPSEEK_API_KEY}",
        "baseURL": "https://api.deepseek.com/v1"
      },
      "models": {
        "deepseek-chat": {
          "id": "deepseek-chat",
          "name": "DeepSeek Chat (V3)",
          "context_length": 65536,
          "temperature": 0
        },
        "deepseek-reasoner": {
          "id": "deepseek-reasoner",
          "name": "DeepSeek R1 (Reasoning)",
          "reasoning": true,
          "context_length": 65536,
          "temperature": 0
        }
      }
    }
  }
}

Local via Ollama (Alternative)

DeepSeek Coder can also run locally through Ollama at zero cost:

ollama pull deepseek-coder-v2:16b

See the Ollama provider section for local configuration. Use cloud API when:

  • Context exceeds local VRAM capacity

  • Need the full DeepSeek-R1 reasoning model (67B+ params)

  • Local GPU is busy with other workloads

Environment Variable

# Add to ~/.zshenv or export in session
export DEEPSEEK_API_KEY="<your-key>"

Cost Estimate

Model Input Output

DeepSeek Chat (V3)

$0.27/M tokens

$1.10/M tokens

DeepSeek R1 (Reasoning)

$0.55/M tokens

$2.19/M tokens

For reference: A typical coding session (~50k tokens in, ~20k tokens out) costs approximately $0.04 with DeepSeek Chat. That is roughly 10-20x cheaper than Claude Sonnet or GPT-4.1.

Provider: GitHub Copilot

Why GitHub Copilot

  • Already subscribed — Copilot Pro subscription is paid, leverage it

  • Multiple models — Access to GPT-4.1, Claude, Gemini through one subscription

  • Device login — No API key management, browser-based auth

  • Fast completions — Optimized for code completion use cases

Configuration

{
  "provider": {
    "copilot": {
      "id": "copilot",
      "models": {
        "gpt-4.1": {
          "id": "gpt-4.1",
          "name": "GPT-4.1 (via Copilot)"
        },
        "claude-sonnet": {
          "id": "claude-sonnet-4-6",
          "name": "Claude Sonnet 4.6 (via Copilot)"
        },
        "gemini-flash": {
          "id": "gemini-2.5-flash",
          "name": "Gemini 2.5 Flash (via Copilot)"
        }
      }
    }
  }
}

Authentication

GitHub Copilot uses device login flow (no API key):

# OpenCode prompts for device login on first use
# 1. Opens browser to https://github.com/login/device
# 2. Enter the code shown in terminal
# 3. Authorize OpenCode
# Token is cached for subsequent sessions

Subscription Tiers

Tier Models Available Notes

Copilot Free

GPT-4o Mini, Claude Haiku

Limited requests/month

Copilot Pro ($10/mo)

GPT-4.1, Claude Sonnet, Gemini Flash

Generous limits

Copilot Pro+ ($39/mo)

All models including Opus, O3, Gemini Pro

Unlimited premium requests

Some models require Copilot Pro+ tier. Check GitHub’s model catalog for current availability.

Best Use Cases

  • Quick completions — Fast code generation from existing subscription

  • Model comparison — Access Claude, GPT, and Gemini from one provider

  • Budget management — Flat subscription fee, no per-token surprise bills

Provider: OpenAI (ChatGPT)

Why OpenAI

  • Second opinion — Different training data than Claude, catches different patterns

  • Reasoning models — O3, O4-mini for chain-of-thought tasks

  • Browser auth — ChatGPT Plus/Pro subscribers can auth without API key

  • Proven ecosystem — Widest tool/library support

API Configuration

{
  "provider": {
    "openai": {
      "id": "openai",
      "api": {
        "apiKey": "{env:OPENAI_API_KEY}"
      },
      "models": {
        "gpt-4.1": {
          "id": "gpt-4.1",
          "name": "GPT-4.1",
          "context_length": 1000000,
          "temperature": 0
        },
        "gpt-4.1-mini": {
          "id": "gpt-4.1-mini",
          "name": "GPT-4.1 Mini (Fast)",
          "context_length": 1000000,
          "temperature": 0
        },
        "o4-mini": {
          "id": "o4-mini",
          "name": "O4 Mini (Reasoning)",
          "reasoning": true,
          "context_length": 200000,
          "temperature": 0
        }
      }
    }
  }
}

Browser Authentication (Alternative)

ChatGPT Plus/Pro subscribers can authenticate via browser session instead of API key:

{
  "provider": {
    "openai": {
      "id": "openai",
      "api": {
        "apiKey": "browser"
      }
    }
  }
}
Browser auth uses your subscription quota. API key uses pay-per-token billing.

Environment Variable

export OPENAI_API_KEY="<your-key>"

Cost Estimate

Model Input Output

GPT-4.1

$2.00/M tokens

$8.00/M tokens

GPT-4.1 Mini

$0.40/M tokens

$1.60/M tokens

O4 Mini

$1.10/M tokens

$4.40/M tokens

Best Use Cases

  • Second opinion — Cross-check Claude’s recommendations on architecture decisions

  • Reasoning tasks — O4-mini for step-by-step problem solving

  • Long context — GPT-4.1 handles 1M token windows

Provider: Google Gemini

Why Gemini

  • Massive context window — 1M+ tokens, best for multi-file analysis

  • Free tier — Google AI Studio offers generous free usage

  • Code review — Strong at understanding relationships across large codebases

  • Vertex AI — Enterprise-grade alternative with GCP integration

{
  "provider": {
    "google": {
      "id": "google",
      "api": {
        "apiKey": "{env:GOOGLE_GENERATIVE_AI_API_KEY}"
      },
      "models": {
        "gemini-2.5-pro": {
          "id": "gemini-2.5-pro-preview-06-05",
          "name": "Gemini 2.5 Pro (Preview)",
          "reasoning": true,
          "context_length": 1048576,
          "temperature": 0
        },
        "gemini-2.5-flash": {
          "id": "gemini-2.5-flash-preview-05-20",
          "name": "Gemini 2.5 Flash (Fast)",
          "context_length": 1048576,
          "temperature": 0
        }
      }
    }
  }
}

Vertex AI Configuration (Alternative)

For GCP-integrated enterprise usage:

{
  "provider": {
    "vertex": {
      "id": "vertex",
      "api": {
        "project": "{env:GCLOUD_PROJECT}",
        "location": "us-central1"
      },
      "models": {
        "gemini-2.5-pro": {
          "id": "gemini-2.5-pro-preview-06-05",
          "name": "Gemini 2.5 Pro (Vertex)"
        }
      }
    }
  }
}

Requires gcloud auth application-default login or service account credentials.

Environment Variables

# Google AI Studio
export GOOGLE_GENERATIVE_AI_API_KEY="<your-key>"

# Vertex AI (alternative)
export GCLOUD_PROJECT="your-project-id"

Cost Estimate

Model Input Output

Gemini 2.5 Pro

$1.25/M tokens (⇐200k), $2.50/M (>200k)

$10.00/M tokens

Gemini 2.5 Flash

$0.15/M tokens (⇐200k), $0.30/M (>200k)

$0.60/M tokens

Gemini 2.5 Flash is exceptionally cheap for its capability. Use it as the default "cloud" model for cost-sensitive tasks that exceed Ollama’s capacity.

Best Use Cases

  • Multi-file code review — Feed entire repo structures into 1M context window

  • Architecture analysis — Understand cross-file relationships

  • Cost-effective cloud — Gemini Flash at $0.15/M input is almost free

  • Long document processing — Analyze full Antora component sources at once

Provider: Anthropic (Claude)

Why Claude via OpenCode

  • Best reasoning — Claude Opus/Sonnet for complex refactoring and architecture decisions

  • Known quantity — Extensive experience from Claude Code, understand its strengths and weaknesses

  • Pay-per-token — Use API for targeted heavy tasks, not flat subscription for everything

  • Browser auth — Claude Pro/Max subscribers can auth without API key

API Configuration

{
  "provider": {
    "anthropic": {
      "id": "anthropic",
      "api": {
        "apiKey": "{env:ANTHROPIC_API_KEY}"
      },
      "models": {
        "claude-opus": {
          "id": "claude-opus-4-6",
          "name": "Claude Opus 4.6",
          "reasoning": true,
          "attachment": true,
          "context_length": 200000,
          "temperature": 0
        },
        "claude-sonnet": {
          "id": "claude-sonnet-4-6",
          "name": "Claude Sonnet 4.6",
          "reasoning": true,
          "attachment": true,
          "context_length": 200000,
          "temperature": 0
        },
        "claude-haiku": {
          "id": "claude-haiku-4-5-20251001",
          "name": "Claude Haiku 4.5",
          "attachment": true,
          "context_length": 200000,
          "temperature": 0
        }
      }
    }
  }
}

Browser Authentication (Alternative)

Claude Pro/Max subscribers can authenticate via browser session:

{
  "provider": {
    "anthropic": {
      "id": "anthropic",
      "api": {
        "apiKey": "browser"
      }
    }
  }
}
Browser auth uses your subscription quota. API key uses pay-per-token billing. If you have Claude Max, browser auth is more cost-effective for heavy usage.

Environment Variable

export ANTHROPIC_API_KEY="<your-key>"

Cost Estimate

Model Input Output

Claude Opus 4.6

$15.00/M tokens

$75.00/M tokens

Claude Sonnet 4.6

$3.00/M tokens

$15.00/M tokens

Claude Haiku 4.5

$0.80/M tokens

$4.00/M tokens

Usage Strategy

Claude is the premium tier. Use it surgically:

Model When to Use

Opus 4.6

Architecture decisions, complex multi-file refactoring, ambiguous requirements that need deep reasoning. The "phone a friend" option.

Sonnet 4.6

Standard coding tasks that exceed local model capability. Good balance of quality and cost.

Haiku 4.5

Fast linting, simple edits, read-only audits. Use instead of Sonnet when quality is not the bottleneck.

Migration Note

Moving from Claude Code (proprietary) to Claude via OpenCode (open client) does NOT reduce model quality. The same API, same models, same reasoning. What changes:

  • Client — Open-source TUI instead of proprietary CLI

  • Flexibility — Switch to DeepSeek/Ollama for cost-sensitive tasks mid-session

  • Skills/hooks — Different plugin system (JS/TS vs bash hooks), but .claude/skills/ is cross-compatible

Model Configuration

Default Model

{
  "model": "anthropic/claude-sonnet-4-6",
  "small_model": "anthropic/claude-haiku-4-5"
}
Key Purpose

model

Primary model for all agents (unless overridden per agent/mode)

small_model

Lightweight model for automatic tasks (title generation, summaries, compaction)

Model Loading Priority

  1. Command-line flags (--model or -m)

  2. Config file model setting

  3. Last used model (remembered from previous session)

  4. First model by internal priority

Model Variants (Reasoning Effort)

Variants let you toggle reasoning effort for the same model without switching models.

Built-in Variants

Provider Available Variants

Anthropic

high (default), max

OpenAI

none, minimal, low, medium, high, xhigh

Google

low, high

Cycle Variants

Default keybind: Ctrl+T (variant_cycle)

This toggles between reasoning effort levels. On Anthropic: highmaxhigh. On OpenAI: cycles through all 6 levels.

Custom Variants

{
  "provider": {
    "openai": {
      "models": {
        "gpt-4.1": {
          "variants": {
            "high": {
              "reasoningEffort": "high",
              "textVerbosity": "low",
              "reasoningSummary": "auto"
            },
            "low": {
              "reasoningEffort": "low",
              "textVerbosity": "low",
              "reasoningSummary": "auto"
            }
          }
        }
      }
    }
  }
}

Disable a Variant

{
  "provider": {
    "anthropic": {
      "models": {
        "claude-sonnet-4-6": {
          "variants": {
            "max": {
              "disabled": true
            }
          }
        }
      }
    }
  }
}

Anthropic Extended Thinking

{
  "provider": {
    "anthropic": {
      "models": {
        "claude-sonnet-4-6": {
          "options": {
            "thinking": {
              "type": "enabled",
              "budgetTokens": 16000
            }
          }
        }
      }
    }
  }
}

Per-Agent Model Override

Each agent can use a different model:

{
  "agent": {
    "build": {
      "model": "anthropic/claude-sonnet-4-6"
    },
    "plan": {
      "model": "ollama/qwen-coder-14b"
    }
  }
}

Per-Mode Model Override

Each mode can use a different model:

{
  "mode": {
    "build": {
      "model": "anthropic/claude-sonnet-4-6"
    },
    "plan": {
      "model": "ollama/qwen-coder-14b"
    }
  }
}

Our Model Strategy

Context Model Rationale

Build mode (default)

ollama/qwen-coder-14b

Free local, fast, adequate for most edits

Plan mode

ollama/qwen-coder-14b

Read-only analysis, local is fine

Complex tasks

anthropic/claude-sonnet-4-6

Switch via Ctrl+X → M when quality matters

Quick summaries

ollama/llama3.1:8b (small_model)

Fast local for titles, compaction

Cost-check reasoning

Variant cycle (Ctrl+T)

Toggle reasoning effort without switching models