Ollama Remote OpenAI Proxy
Overview
This VS Code extension emulates a local Ollama instance on 127.0.0.1:11434, forwards every request to a remote Ollama host, and exposes an OpenAI-compatible /v1 surface in parallel. Editors and AI assistants keep talking to “local” Ollama while the heavy lifting happens remotely.
VS Code / Tool ──► http://127.0.0.1:11434 ──► Remote Ollama (z. B. http://server:11434)
│
└─ OpenAI-kompatible /v1-Endpunkte
Highlights
- Fully configurable remote target (protocol, host, port, base path, optional API key).
- Local listener stays Ollama-compatible (defaults to
127.0.0.1:11434, customizable).
- Translates
/v1/chat/completions, /v1/completions, /v1/models, /v1/embeddings, /v1/images/generations, /v1/responses, and /v1/moderations to the Ollama API.
- Native SSE passthrough: streaming chat/completion requests deliver OpenAI-style Server-Sent Events.
- Optional API-key auth with per-minute quotas for multi-user setups.
- Activity-bar view with start/stop/restart actions, remote configuration, and Logs panel (copy/clear buttons).
- CLI smoke test (
node scripts/run-proxy-test.js) for fast end-to-end verification.
Configuration
Settings live under Settings → Ollama Remote Proxy (or settings.json):
| Setting |
Default |
Description |
ollamaProxy.remote.protocol |
http |
http or https |
ollamaProxy.remote.host |
127.0.0.1 |
Remote host / IP |
ollamaProxy.remote.port |
11434 |
Remote port |
ollamaProxy.remote.basePath |
/ |
Optional prefix |
ollamaProxy.remote.apiKey |
"" |
Optional Bearer token forwarded upstream |
ollamaProxy.server.host |
127.0.0.1 |
Local bind interface |
ollamaProxy.server.port |
11434 |
Local port |
ollamaProxy.openai.basePath |
/v1 |
Path that exposes the OpenAI facade |
ollamaProxy.auth.enabled |
false |
Require API keys for every request |
ollamaProxy.auth.header |
x-api-key |
Header checked for API keys (Authorization Bearer accepted, too) |
ollamaProxy.auth.tokens |
[] |
List of { key, limitPerMinute? } objects that may access the proxy |
The activity-bar entry additionally offers:
- Start / Stop / Restart commands
- Remote target dialog
- Live status (local + remote endpoints)
- Log inspection / copy buttons
Usage
- Start the proxy (auto on VS Code startup or via “Start Ollama Proxy Server”).
- Point tools to
http://127.0.0.1:11434/v1. Chat/completion/response requests can use stream: true to receive SSE chunks; embeddings, moderation checks, and best-effort image generations map to the corresponding Ollama endpoints.
- If auth is enabled, clients must send the configured header (e.g.,
x-api-key: <token>). Exceeding per-minute quotas yields HTTP 429.
- For any failure, open the Logs view, copy entries, and attach them to bug reports.
CLI Smoke Test
node scripts/run-proxy-test.js \
--remote-host http://45.11.228.163:11434 \
--model gpt-oss:20b \
--prompt "Say hello and mention the host."
Optional flags:
--remote-host, --remote-port, --remote-protocol, --remote-base-path, --remote-api-key,
--local-host, --local-port, --openai-base-path,
--model, --prompt, --system-prompt, --verbose, --timeout
The script spins up the proxy on 127.0.0.1:18000, calls /v1/models, /v1/chat/completions, /api/tags, and then shuts everything down.
Troubleshooting
- Check logs: Activity bar → “Ollama Proxy” → “Logs”. Timestamps, errors, and payload snippets are captured there.
- Verify remote reachability:
curl http://REMOTE:PORT/api/tags.
- Use valid model names: match whatever
/api/tags returns (e.g., gpt-oss:20b, sam860/granite-4.0:7b).
- Streaming expectations: SSE passthrough is supported for chat/completions/responses. If a client expects chunk formats beyond OpenAI’s spec, enable verbose logging to inspect what it receives.
- Audio endpoints: Ollama does not natively transcribe audio. The proxy expects the request payload to include a
text or prompt field containing the transcript you want to moderate/translate.
License
MIT (see LICENSE).
| |