Shire Autocompletion

Fast inline code autocomplete for VS Code, powered by your own model endpoint — not a hosted SaaS. Point it at any OpenAI-compatible server (vLLM, llama.cpp, Ollama, TGI, SGLang, or a cloud endpoint) running a FIM-capable code model (Qwen2.5-Coder, Codestral, DeepSeek-Coder, StarCoder2) and get Copilot-style ghost-text completions.

Features

Inline FIM completions with the correct fill-in-the-middle prompt format — streaming, low latency, debounced, request-cancelling.
Recent-edit refactor assist — feeds your recent before→after edits into the prompt, so after you change something in one file, it proposes the analogous change as you type in another.
Cross-file repo context — indexes your workspace (BM25 lexical by default) and injects the most relevant snippets. Optional semantic retrieval via your own embedding model.
Private by default — your code only ever goes to the endpoint you configure. API key is stored in VS Code's encrypted SecretStorage.
Sidebar UI — configure everything and watch live index/latency status, no JSON editing.

Quick start

Install the extension.
Open the Shire Autocompletion panel from the Activity Bar (sparkle icon).
Set Base URL (e.g. http://localhost:8000/v1), API Key (if any), and Model.
Start typing in a code file — grey ghost text appears; press Tab to accept.

Your endpoint must expose OpenAI-compatible /v1/completions (legacy text completion, not chat) with stream and stop support. Serve a FIM/base code model for best infill.

Model compatibility (important)

Autocomplete needs a fill-in-the-middle (FIM) code model, and the FIM token format must match the model family. Set fimTemplate accordingly:

`fimTemplate`	Models
`qwen` (default)	Qwen2.5-Coder, Qwen3-Coder
`codestral`	Mistral Codestral
`deepseek`	DeepSeek-Coder
`starcoder`	StarCoder, StarCoder2
`custom`	anything — set `customFimTemplate` (`{prefix}`/`{suffix}`) + `customStop`

If your endpoint only exposes /v1/chat/completions (not the legacy /v1/completions), set apiMode to chat or auto. Use Test Connection in the panel to verify the model actually does FIM.

How it works (concepts)

When your cursor sits in the middle of code, the extension sends the model two pieces:

prefix — the code before the cursor.
suffix — the code after the cursor.

The model fills the gap between them. This is Fill-In-the-Middle (FIM) — different from chat, and the reason a normal chat model gives bad completions. Each model family marks the prefix/suffix with its own special tokens, which is what fimTemplate selects.

On top of that, it can add context to the prompt: your recent edits and relevant snippets from other files (found by indexing your repo). More context = smarter completions, at the cost of a slightly bigger prompt.

Configuration

The sidebar panel covers the essentials; everything is in VS Code Settings (Cmd/Ctrl+, → search "Shire", or click ⚙ in the panel). Grouped reference:

Connection

Setting	Default	Meaning
`shire.baseUrl`	`http://localhost:8000/v1`	Your endpoint, ending in `/v1`. The extension calls `{baseUrl}/completions` (or `/chat/completions`).
`shire.apiKey`	— (panel)	Sent as `Authorization: Bearer`. Set it via the panel — stored encrypted, not in plaintext settings.
`shire.model`	`Qwen2.5-Coder-7B`	Completion model name as registered on your server.

Completion API

Setting	Default	Meaning
`shire.apiMode`	`completions`	`completions` = FIM (best). `chat` = use `/chat/completions` (for endpoints lacking the legacy route; lower quality). `auto` = try completions, fall back to chat.
`shire.fimTemplate`	`qwen`	FIM token format — must match your model family: `qwen`, `starcoder`, `deepseek`, `codestral`, or `custom`. Wrong choice = garbage output.
`shire.customFimTemplate`	`{prefix}{suffix}`	Only when `fimTemplate=custom`. A template string containing the `{prefix}` and `{suffix}` placeholders, e.g. `<PRE>{prefix}<SUF>{suffix}<MID>`.
`shire.customStop`	—	Only when `fimTemplate=custom`. Comma-separated stop strings that end generation.
`shire.temperature`	`0.1`	Randomness. Low = deterministic, predictable completions (recommended).
`shire.maxTokens`	`256`	Max length of a single completion.
`shire.requestTimeoutMs`	`8000`	Give up on a request after this many ms (prevents a hung endpoint from freezing suggestions).

When completions fire / where

Setting	Default	Meaning
`shire.enabled`	`true`	Master on/off.
`shire.debounceMs`	`150`	How long you must pause typing before a request fires. Higher = fewer requests, more lag.
`shire.multiline`	`true`	Allow multi-line completions. Off = single line only.
`shire.disabledLanguages`	`scminput, git-commit, plaintext`	Language ids where it stays silent. Add e.g. `markdown`, `json`. (Language id shows bottom-right of the editor.)
`shire.maxPrefixChars`	`3000`	How much code before the cursor to send. Bigger = more context, bigger/slower prompt.
`shire.maxSuffixChars`	`1500`	How much code after the cursor to send.

Context (recent edits + cross-file)

Setting	Default	Meaning
`shire.enableRecentEdits`	`true`	Feed your recent before→after edits into the prompt, so a change in one file is suggested in another (refactor assist).
`shire.enableRepoContext`	`true`	Index the workspace and inject the most relevant snippets from other files.
`shire.maxNeighborFiles`	`4`	How many cross-file snippets to inject per completion.
`shire.retrievalTimeoutMs`	`150`	Max time spent finding cross-file context before giving up and completing without it (keeps things fast).
`shire.enableSemanticRetrieval`	`false`	Use embeddings (vector similarity) in addition to keyword search. Higher relevance, needs an embedding model + a slower index build.
`shire.embedModel`	`nomic-embed-text`	Embedding model name — only used when semantic is on.

Indexing

Setting	Default	Meaning
`shire.maxIndexFiles`	`30000`	Max files indexed in lexical mode (semantic off). Covers large monorepos.
`shire.maxSemanticFiles`	`3000`	Max files indexed in semantic mode (every file is embedded, so kept lower).
`shire.embedConcurrency`	`5`	How many files to embed in parallel during a semantic build. Higher = faster build, more load on the endpoint.

File selection: the indexer skips node_modules, .git, build output, anything in your .gitignore / VS Code files.exclude, files over 120 KB, and non-code files. The set is sorted and capped deterministically, so rebuilds are identical.

Commands

Cmd/Ctrl+Shift+P →

Shire Autocompletion: Test Connection — verifies the endpoint and whether the model does FIM.
Shire Autocompletion: Rebuild Repo Index
Shire Autocompletion: Show Debug Output — per-completion latency + errors.
Shire Autocompletion: Toggle Enabled

What leaves your machine

When a completion fires, the code around your cursor (prefix + suffix), your recent edits, and any retrieved cross-file snippets are sent to your configured baseUrl — nowhere else. No telemetry. The API key is stored in encrypted SecretStorage.

What it does / doesn't

Does: Copilot-tier single/multi-line completion, cross-file context, refactor propagation — all against your own model.
Doesn't: it is in-context, not model training — the model never learns your code into its weights. And it doesn't reproduce Cursor's custom next-edit "cursor jump" model; you get the suggestion when you start typing at the new site.

Privacy

No telemetry. Your code is sent only to the baseUrl you set. The API key is kept in encrypted SecretStorage, never in plaintext settings (a legacy plaintext apiKey setting is honored as a fallback only).

License

MIT

Shire Autocompletion

lawndlwd

Shire Autocompletion

Features

Quick start

Model compatibility (important)

How it works (concepts)

Configuration

Connection

Completion API

When completions fire / where

Context (recent edits + cross-file)

Indexing

Commands

What leaves your machine

What it does / doesn't

Privacy

License