Enhance your coding workflow with effortless access to Ollama models from the comfort of your Visual Studio Code editor.
How to Use
Install Ollama:
Ensure you have Ollama installed and running on your system. Follow the installation instructions on the Ollama website for your operating system.
Make sure your Ollama server is accessible at the configured API URL (default is http://localhost:11434/api/generate).
Install the Extension:
Open Visual Studio Code.
Go to the Extensions view (Ctrl+Shift+X or Cmd+Shift+X).
Search for Ollama Assistant (or vscode-ollama-assistant).
Click Install.
Open the Ollama Assistant:
Command Palette: Press Ctrl+Shift+P (or Cmd+Shift+P on macOS) to open the Command Palette.
Type Ollama: Open Ollama Assistant and select the command.
Status Bar: Click on the $(hubot) Ollama status bar item in the bottom left corner of the VS Code window.
Sidebar: Find the Ollama icon in the Explorer sidebar. Click on it to open the Ollama view.
Start Chatting:
Once the Ollama view is open (either as a panel or in the sidebar), you will see a chat interface.
Type your prompt in the input field at the bottom and press Send or press Enter.
Ollama will process your prompt and display the response in the chat window.
Clear History:
Click the Clear History button in the chat interface to clear the current chat history. This will also clear the saved history for the current VS Code session.
Extension Settings
You can configure the Ollama Assistant extension settings in your VS Code settings (File > Preferences > Settings, or Code > Settings > Settings on macOS). Search for Ollama to find the extension settings.
Here's a breakdown of the available settings:
Setting Name
Type
Default Value
Description
ollama.apiUrl
string
http://localhost:11434/api/generate
URL for the Ollama API: Specify the URL of your running Ollama API server. Modify this if your Ollama server is running on a different host or port.
ollama.model
string
llama2
Default model to use for Ollama: Set the default Ollama model to be used for chat interactions. Ensure the model is available in your Ollama installation (you can pull models using ollama pull <model_name> in your terminal).
ollama.temperature
number
0.8
Temperature for Ollama model (0 to 1, higher for more creative): Controls the randomness of the model's output. Values closer to 1 make the output more random and creative, while values closer to 0 make it more deterministic and focused.
ollama.numCores
integer
-1
Number of CPU cores to use for Ollama. Set to -1 to use all available cores.: Adjust the number of CPU cores that Ollama can utilize for processing. Setting it to -1 (default) allows Ollama to use all available cores, which is generally recommended for optimal performance. You can limit this if you want to reserve CPU resources for other tasks.
ollama.topP
number
0.9
Top-P sampling parameter (nucleus sampling, 0 to 1): Also known as nucleus sampling. It controls the cumulative probability of the tokens to consider during generation. A value of 0.9 means the model will consider the smallest set of tokens whose cumulative probability exceeds 0.9. Lower values make the output more focused.
ollama.topK
integer
40
Top-K sampling parameter (integer, e.g., 40): Limits the model to consider only the top K most likely tokens at each step of generation. Lower values make the output more focused and less diverse.
ollama.numPredict
integer
256
Maximum number of tokens to predict (integer): Sets the maximum length of the response generated by the Ollama model. You can increase this value if you expect longer responses.
OllamaRAG A basic RAG that allows you to upload code files, process them, and query them using Ollama models.
OllamaCoder A basic, interactive code analysis and assistant interface for Ollama, designed for developers to get instant code reviews, suggestions, and improvements using coding language models.