Ollama with Decoder
Ollama is a lightweight runtime for serving open-weight LLMs locally. Decoder talks to it through its OpenAI-compatible endpoint.
What it is
A local model server exposing an OpenAI-compatible HTTP API.
Why it's useful
Zero-config private inference; one binary, many open-weight models.
How Decoder implements it
Settings → Local AI → base URL `http://localhost:11434/v1` → choose model (e.g. llama3.1, qwen2.5-coder).
When to use it
Private code analysis, offline work, learning prompts without burning cloud tokens.
When NOT to use it
Frontier-quality reasoning on large diffs — cloud frontier models still lead.
Practical example
`ollama pull qwen2.5-coder:7b` then point Decoder at the local URL and run Explain on a function.
FAQ
Glossary
- Open-weight model
- An LLM whose weights are publicly downloadable and runnable locally.
Related
Local AI lets you use Decoder's explain and chat features against a model running on your own hardware via Ollama or LM Studio — useful when code cannot leave your environment.
BYOK means you bring your own AI provider key. Decoder never proxies AI calls through a shared account: your key, your billing, your privacy boundary.
OpenRouter is a unified API in front of many model providers. With BYOK you get access to dozens of models in Decoder from a single key.
Chat with Your Code turns a repository into a queryable knowledge surface. Ask 'where is auth handled?' or 'what does this script do?' and get answers grounded in your actual files.
LM Studio is a desktop app that runs LLMs locally with an OpenAI-compatible API. Decoder targets that endpoint when you choose local inference.