Simple LLM Completion

This VSCode extension adds LLM completion for local LLM services using an OpenAI API-compatible endpoint.

Features

Uses an OpenAI API-compatible endpoint (v1/completions).
Uses the Qwen Coder FIM template for prompts.
Uses open files as an additional context (optional).
Triggers automatically (optional).
Sends at most 1 request at a time.

Requirements

You will need an LLM service that serves the LLM model, compatible with the OpenAI API. Examples include llama.cpp, LM Studio, Lemonade, and other LLM service implementations.

The extension has been tested with the following model families:

Qwen2.5-Coder
Qwen3.0-Coder

How to Start a Local LLM Service?

You can start a local service with the Qwen2.5-Coder-3B model using a selection of implementations.

Using llama.cpp CLI

Start the service:
```
llama-server --fim-qwen-3b-default
```
Set the apiEndpoint in the extension settings to http://127.0.0.1:8012/v1.

Using LM Studio CLI

Note: LM Studio does not provide the Qwen2.5-Coder-3B model out-of-the-box.

Prepare the model directory:

mkdir -p ~/.lmstudio/models/qwen/qwen2.5-coder-3b

Download the model from Hugging Face:

curl -L https://huggingface.co/ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF/resolve/main/qwen2.5-coder-3b-q8_0.gguf -o ~/.lmstudio/models/qwen/qwen2.5-coder-3b/qwen2.5-coder-3b-q8_0.gguf

Load the model and start the service:

lms load qwen/qwen2.5-coder-3b --context-length 32768 && lms server start

Set the apiEndpoint in the extension settings to http://127.0.0.1:1234/v1.
Set the model in the extension settings to qwen2.5-coder-3b.

Using Lemonade CLI

Note: Lemonade does not provide the Qwen2.5-Coder-3B model out-of-the-box.

Download the model:

lemonade-server pull user.qwen2.5-coder-3b --checkpoint ggml-org/Qwen2.5-Coder-3B-Q8_0-GGUF:Q8_0 --recipe llamacpp

Start the service:
```
lemonade-server serve --ctx-size 0
```
Set the apiEndpoint in the extension settings to http://127.0.0.1:8000/api/v1.
Set the model in the extension settings to user.qwen2.5-coder-3b.

Others

Ollama does not work. Ollama appends the chat template to the v1/completions endpoint; see the issue.

Extension Settings

The extension contributes the following settings:

simple-llm-completion.useAutomaticCompletion: Enable automatic completion.
simple-llm-completion.useContextFromOpenFiles: Enable using open files in the editor for context.
simple-llm-completion.apiEndpoint: The endpoint compatible with the OpenAI API (default: http://127.0.0.1:8012/v1).
simple-llm-completion.model: The name of the model to use (the service may ignore this property).
simple-llm-completion.temperature: The temperature parameter for the model.
simple-llm-completion.maxCompletionTokens: The maximum number of tokens to generate in a single completion.

The extension uses the environment variable OPENAI_API_KEY if it's set.

Extension Hotkeys

This extension sets the following hotkeys:

Ctrl + L: Trigger the completion.

Release Notes

0.0.2

Activate the extension automatically.

0.0.1

Initial release of the extension.

Simple LLM Completion

Jaroslaw Sliwinski

Simple LLM Completion

Features

Requirements

How to Start a Local LLM Service?

Others

Extension Settings

Extension Hotkeys

Release Notes

0.0.2

0.0.1