Take your coding to the next level with AI while keeping privacy and confidentiality under your control.
You can run any arbitrary LLM model supported by llama.cpp if
you choose to run the provided Docker container that this extension supports. Or build and use your own
wrapper, which can run locally or remotely.
Use open models, or your own private models, finetuned to your individual or team data.
It won't compete with ChatGPT or GitHub Copilot anytime soon, but it balances that with the ownership advantage.
For a quick start, it's recommended to use the supported container server. Ensure you have the LLM model binary
to use. and edit the settings accordingly. Then call
PairCoder: Start LLM Server Container via the Command Palette.
Select any text and use the contextual menu or the Command Palette for code generation. The prompt is the selected text.
Starting a new generation will interrupt the current one. You can also interrupt at any time using
PairCoder: Stop PairCoding
from the Command Palette.
- Docker if you choose to use the encapsulated container functionality (recommended).
- Binaries of the model(s) that will be used for code generation.
You can also run your own server (locally or remotely). The extension simply expects a REST API with a
POST endpoint. See the Github repository for details.
This extension contributes the following settings:
youcefrahal.paircoder.server.url: URL of the LLM server. Default is
http://localhost:8399/predict. Make sure to update the port number if you edit the
youcefrahal.paircoder.docker.image: The Docker image to use. Default is
youcefrahal.paircoder.docker.port: Port to map the LLM server running in the container. Default is
youcefrahal.paircoder.model.modelsPath: Absolute path to a local folder containing one or more LLMs that can be ingested by
youcefrahal.paircoder.model.filename: Filename of the LLM to load. If unspecfifed, the first file found in the models folder will be used.
youcefrahal.paircoder.model.n_predict: number of tokens to predict (default: 128, infinity: -1).
You should only have to edit
youcefrahal.paircoder.model.modelsPath with your own value, and possibly
if you have more than one model in that folder.
You might have to edit
youcefrahal.paircoder.docker.port if you have a port conflict (and therefore
- Parallel queries are unsupported but nothing prevents attempting them.
- Obviously, the speed and performance of the generation are dependent on the hardware and the model used. The above screenshot uses the base GPT4All model. Vicuna 13B will give better results in general, but slower.
- Ability to generate code or text using a local or a remote server.
- Integration of a container that serves
llama.cpp, configurable to work with any of their supported models.
- A new command to interrupt prediction. Starting a new prediction will also interrupt the current one.
- A new option to customize the number of predicted tokens.
- By default, Docker images are now tied to the same release number as the extension.