Skip to content
| Marketplace
Sign in
Visual Studio Code>Programming Languages>Blink - Local AI Code CompletionsNew to Visual Studio Code? Get it now.
Blink - Local AI Code Completions

Blink - Local AI Code Completions

Lojell

|
2 installs
| (1) | Free
Lightning-fast inline AI completions (ghost text), local-first and BYOK: GGUF models in-process via llama.cpp, or any OpenAI-compatible endpoint.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Blink — Local AI Code Completions

AI ghost text that appears in a Blink — running entirely on your machine.

Blink brings lightning-fast inline completions to VS Code, local-first and bring-your-own-key. Run a GGUF code model in-process via llama.cpp — no server, no account, no cloud, no subscription — or point Blink at any OpenAI-compatible /v1/completions endpoint with your own key.

Blink completing code in VS Code

Quick start

  1. Install Blink and open any code file.
  2. Run Blink: Select Model… (Command Palette, or click the Blink status bar item) and pick a recommended model — Blink downloads it with progress.
  3. Type. Ghost text appears; press Tab to accept.

That's it — no API key, no account, no setup beyond picking a model. To use your own endpoint instead, add an entry to blink.models (below) and select it via Blink: Select Model… or the blink.model setting.

Why Blink?

  • Private by default — with a local model, your code never leaves your machine. No telemetry, ever.
  • Genuinely fast — the model runs inside VS Code's process with native fill-in-the-middle (FIM), not chat prompting over the network.
  • Yours to configure — local GGUF or any OpenAI-compatible endpoint, switchable in two clicks.

Features

  • Inline ghost text as you type, accepted with Tab — driven by native fill-in-the-middle (FIM), not chat prompting.
  • Local models, in-process — pick a recommended Qwen2.5-Coder GGUF (0.5B–7B) and Blink downloads and runs it inside VS Code. Nothing leaves your machine.
  • BYOK remote option — any OpenAI-compatible completions endpoint (vLLM, llama-server, TGI, a cloud provider) with your own key.
  • Model registry — configure several models once, switch instantly with Blink: Select Model… from the palette or the status bar.
  • GPU out of the box — Vulkan on Windows/Linux x64, Metal on Apple Silicon; CPU works everywhere. On NVIDIA machines Blink offers a one-click CUDA download (~580 MB) for peak throughput.
  • Status bar control — live state plus hover actions: settings, model switch, enable/disable.

Requirements

  • RAM ≈ model size: ~0.7 GB for the smallest recommended quant, ~4.7 GB for the 7B. A GPU is optional but recommended (Vulkan / Metal).
  • Local models must support FIM (fill-in-the-middle). Blink rejects GGUFs without infill tokens at load — base coder models work (Qwen2.5-Coder); instruct/chat models usually don't.
  • Platforms: Windows x64 & arm64, Linux x64 & arm64, macOS Intel & Apple Silicon.

Settings

Setting Default What it does
blink.enabled true Master switch for inline completions.
blink.model "" Name of the active entry in blink.models.
blink.models [] The model registry (examples below).

One registry entry per model; the two backends:

"blink.models": [
  {
    "name": "local-qwen",                // select with "Blink.model": "local-qwen"
    "backend": "llamacpp",
    "modelId": "qwen2.5-coder",
    "localModelPath": "C:/models/Qwen2.5-Coder-3B-Q6_K.gguf",
    "gpu": "auto"                        // auto | vulkan | metal | off
  },
  {
    "name": "my-endpoint",
    "backend": "openai",
    "modelId": "qwen2.5-coder-7b",       // the API "model" param
    "apiBaseUrl": "https://my-host/v1",
    "apiKey": "sk-…",
    "fim": "<|fim_prefix|>"              // the model family's FIM token
  }
]

Privacy

  • Local models: everything runs and stays on your machine.
  • OpenAI-compatible backend: the assembled prompt (code around your cursor, plus your editor's own completion suggestions for that spot) is sent only to the endpoint you configure. No other network traffic, no telemetry.
  • Your API key lives in VS Code settings (blink.models). In Restricted Mode the model registry is read from user settings only, so an untrusted workspace cannot redirect it. Moving keys to VS Code SecretStorage is on the roadmap.

Known limitations (v0.1)

  • CUDA is a separate download — the VSIX ships Vulkan (which already accelerates NVIDIA); Blink offers the ~580 MB CUDA binaries when it detects an NVIDIA GPU, or via Blink: Select Model….
  • No ollama backend yet — point the openai backend at any OpenAI-compatible server instead.
  • Completions only — Blink is not a chat assistant.
  • Prompt caching, latency tuning, and richer context sources (recent edits, LSP signatures) are in active development.

License

MIT

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft