Skip to content
| Marketplace
Sign in
Visual Studio Code>Other>Ollama Remote OpenAI ProxyNew to Visual Studio Code? Get it now.
Ollama Remote OpenAI Proxy

Ollama Remote OpenAI Proxy

RL-Dev

|
10 installs
| (0) | Free
VS Code helper that exposes a local OpenAI-compatible /v1 endpoint while tunneling requests to a remote Ollama server.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Ollama Remote OpenAI Proxy

Overview

This VS Code extension emulates a local Ollama instance on 127.0.0.1:11434, forwards every request to a remote Ollama host, and exposes an OpenAI-compatible /v1 surface in parallel. Editors and AI assistants keep talking to “local” Ollama while the heavy lifting happens remotely.

VS Code / Tool ──► http://127.0.0.1:11434 ──► Remote Ollama (z. B. http://server:11434)
                      │
                      └─ OpenAI-kompatible /v1-Endpunkte

Highlights

  • Fully configurable remote target (protocol, host, port, base path, optional API key).
  • Local listener stays Ollama-compatible (defaults to 127.0.0.1:11434, customizable).
  • Translates /v1/chat/completions, /v1/completions, /v1/models, /v1/embeddings, /v1/images/generations, /v1/responses, and /v1/moderations to the Ollama API.
  • Native SSE passthrough: streaming chat/completion requests deliver OpenAI-style Server-Sent Events.
  • Optional API-key auth with per-minute quotas for multi-user setups.
  • Activity-bar view with start/stop/restart actions, remote configuration, and Logs panel (copy/clear buttons).
  • CLI smoke test (node scripts/run-proxy-test.js) for fast end-to-end verification.

Configuration

Settings live under Settings → Ollama Remote Proxy (or settings.json):

Setting Default Description
ollamaProxy.remote.protocol http http or https
ollamaProxy.remote.host 127.0.0.1 Remote host / IP
ollamaProxy.remote.port 11434 Remote port
ollamaProxy.remote.basePath / Optional prefix
ollamaProxy.remote.apiKey "" Optional Bearer token forwarded upstream
ollamaProxy.server.host 127.0.0.1 Local bind interface
ollamaProxy.server.port 11434 Local port
ollamaProxy.openai.basePath /v1 Path that exposes the OpenAI facade
ollamaProxy.auth.enabled false Require API keys for every request
ollamaProxy.auth.header x-api-key Header checked for API keys (Authorization Bearer accepted, too)
ollamaProxy.auth.tokens [] List of { key, limitPerMinute? } objects that may access the proxy

The activity-bar entry additionally offers:

  • Start / Stop / Restart commands
  • Remote target dialog
  • Live status (local + remote endpoints)
  • Log inspection / copy buttons

Usage

  1. Start the proxy (auto on VS Code startup or via “Start Ollama Proxy Server”).
  2. Point tools to http://127.0.0.1:11434/v1. Chat/completion/response requests can use stream: true to receive SSE chunks; embeddings, moderation checks, and best-effort image generations map to the corresponding Ollama endpoints.
  3. If auth is enabled, clients must send the configured header (e.g., x-api-key: <token>). Exceeding per-minute quotas yields HTTP 429.
  4. For any failure, open the Logs view, copy entries, and attach them to bug reports.

CLI Smoke Test

node scripts/run-proxy-test.js \
  --remote-host http://45.11.228.163:11434 \
  --model gpt-oss:20b \
  --prompt "Say hello and mention the host."

Optional flags:

--remote-host, --remote-port, --remote-protocol, --remote-base-path, --remote-api-key,
--local-host, --local-port, --openai-base-path,
--model, --prompt, --system-prompt, --verbose, --timeout

The script spins up the proxy on 127.0.0.1:18000, calls /v1/models, /v1/chat/completions, /api/tags, and then shuts everything down.

Troubleshooting

  1. Check logs: Activity bar → “Ollama Proxy” → “Logs”. Timestamps, errors, and payload snippets are captured there.
  2. Verify remote reachability: curl http://REMOTE:PORT/api/tags.
  3. Use valid model names: match whatever /api/tags returns (e.g., gpt-oss:20b, sam860/granite-4.0:7b).
  4. Streaming expectations: SSE passthrough is supported for chat/completions/responses. If a client expects chunk formats beyond OpenAI’s spec, enable verbose logging to inspect what it receives.
  5. Audio endpoints: Ollama does not natively transcribe audio. The proxy expects the request payload to include a text or prompt field containing the transcript you want to moderate/translate.

License

MIT (see LICENSE).

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2025 Microsoft