Ollama Remote OpenAI Proxy

Overview

This VS Code extension emulates a local Ollama instance on 127.0.0.1:11434, forwards every request to a remote Ollama host, and exposes an OpenAI-compatible /v1 surface in parallel. Editors and AI assistants keep talking to “local” Ollama while the heavy lifting happens remotely.

VS Code / Tool ──► http://127.0.0.1:11434 ──► Remote Ollama (z. B. http://server:11434)
                      │
                      └─ OpenAI-kompatible /v1-Endpunkte

Highlights

Fully configurable remote target (protocol, host, port, base path, optional API key).
Local listener stays Ollama-compatible (defaults to 127.0.0.1:11434, customizable).
Translates /v1/chat/completions, /v1/completions, /v1/models, /v1/embeddings, /v1/images/generations, /v1/responses, and /v1/moderations to the Ollama API.
Native SSE passthrough: streaming chat/completion requests deliver OpenAI-style Server-Sent Events.
Optional API-key auth with per-minute quotas for multi-user setups.
Activity-bar view with start/stop/restart actions, remote configuration, and Logs panel (copy/clear buttons).
CLI smoke test (node scripts/run-proxy-test.js) for fast end-to-end verification.

Configuration

Settings live under Settings → Ollama Remote Proxy (or settings.json):

Setting	Default	Description
`ollamaProxy.remote.protocol`	`http`	`http` or `https`
`ollamaProxy.remote.host`	`127.0.0.1`	Remote host / IP
`ollamaProxy.remote.port`	`11434`	Remote port
`ollamaProxy.remote.basePath`	`/`	Optional prefix
`ollamaProxy.remote.apiKey`	`""`	Optional Bearer token forwarded upstream
`ollamaProxy.server.host`	`127.0.0.1`	Local bind interface
`ollamaProxy.server.port`	`11434`	Local port
`ollamaProxy.openai.basePath`	`/v1`	Path that exposes the OpenAI facade
`ollamaProxy.auth.enabled`	`false`	Require API keys for every request
`ollamaProxy.auth.header`	`x-api-key`	Header checked for API keys (Authorization Bearer accepted, too)
`ollamaProxy.auth.tokens`	`[]`	List of `{ key, limitPerMinute? }` objects that may access the proxy

The activity-bar entry additionally offers:

Start / Stop / Restart commands
Remote target dialog
Live status (local + remote endpoints)
Log inspection / copy buttons

Usage

Start the proxy (auto on VS Code startup or via “Start Ollama Proxy Server”).
Point tools to http://127.0.0.1:11434/v1. Chat/completion/response requests can use stream: true to receive SSE chunks; embeddings, moderation checks, and best-effort image generations map to the corresponding Ollama endpoints.
If auth is enabled, clients must send the configured header (e.g., x-api-key: <token>). Exceeding per-minute quotas yields HTTP 429.
For any failure, open the Logs view, copy entries, and attach them to bug reports.

CLI Smoke Test

node scripts/run-proxy-test.js \
  --remote-host http://45.11.228.163:11434 \
  --model gpt-oss:20b \
  --prompt "Say hello and mention the host."

Optional flags:

--remote-host, --remote-port, --remote-protocol, --remote-base-path, --remote-api-key,
--local-host, --local-port, --openai-base-path,
--model, --prompt, --system-prompt, --verbose, --timeout

The script spins up the proxy on 127.0.0.1:18000, calls /v1/models, /v1/chat/completions, /api/tags, and then shuts everything down.

Troubleshooting

Check logs: Activity bar → “Ollama Proxy” → “Logs”. Timestamps, errors, and payload snippets are captured there.
Verify remote reachability: curl http://REMOTE:PORT/api/tags.
Use valid model names: match whatever /api/tags returns (e.g., gpt-oss:20b, sam860/granite-4.0:7b).
Streaming expectations: SSE passthrough is supported for chat/completions/responses. If a client expects chunk formats beyond OpenAI’s spec, enable verbose logging to inspect what it receives.
Audio endpoints: Ollama does not natively transcribe audio. The proxy expects the request payload to include a text or prompt field containing the transcript you want to moderate/translate.

License

MIT (see LICENSE).

Ollama Remote OpenAI Proxy

RL-Dev