Skip to content
| Marketplace
Sign in
Visual Studio Code>Other>Ollama CloudNew to Visual Studio Code? Get it now.
Ollama Cloud

Ollama Cloud

JKagiDesigns LLC

|
470 installs
| (0) | Free
AI-powered coding assistant with cloud and local LLM support
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Ollama Cloud VSCode Extension

An AI-powered coding assistant for Visual Studio Code with multi-provider support. Works with Ollama Cloud, local Ollama, Anthropic Claude, OpenAI ChatGPT, Grok, and more - similar to Cline. This extension provides intelligent code assistance, file editing, and command execution capabilities across multiple AI providers.

Features

  • 🤖 Multi-Provider AI Chat: Interactive chat with Ollama (local/cloud), Anthropic Claude, OpenAI ChatGPT, Grok, and more
  • 📝 Smart Code Editing: AI can read, write, and modify files with your approval
  • 🔧 Command Execution: Execute terminal commands suggested by the AI
  • 🔄 Diff View: Review changes before applying them
  • 🎯 Multiple Models: Support for various models across all providers (Llama, Claude 3.5, GPT-4, etc.)
  • ⚡ Real-time Streaming: See AI responses as they're generated
  • 🎨 VSCode Integration: Native VSCode UI with dark/light theme support
  • 🧠 Enhanced Context Awareness: Tracks tasks, files, and commands for better continuity
  • 💾 Session Persistence: Auto-saves and restores conversations across VSCode restarts
  • 📊 Usage Tracking: Monitor API usage with visual indicators for each provider
  • 🗂️ Workspace Indexing: Automatically understands your project structure
  • 🔁 Automatic Retry: Smart retry logic with exponential backoff for rate limits
  • ⏱️ Configurable Timeouts: Adjust request timeouts for slower connections
  • 🔄 Provider Switching: Easily switch between AI providers based on your needs
  • 🧯 Provider Failover: Automatically fail over to backup providers when the primary provider is unavailable
  • 💸 Token Budget Controls: Enforce per-request token budgets to reduce waste and cost

Current Behavior Notes (v0.1.28)

  • Approval-first by default: In ACT mode, suggested file edits/commands appear as approval cards in chat and are not executed until you approve them.
  • Auto-approve is optional: If ollamaCloud.autoApprove is enabled, suggested actions execute immediately.
  • Code actions route into chat: Right-click actions (Add to Chat, Explain, Improve, Fix) now reliably focus the chat and pass content into the panel.
  • Provider selection is live: Chat requests use the current ollamaCloud.apiProvider setting at send time.
  • Provider failover is configurable: transient provider failures can automatically fail over using ollamaCloud.providerFallbackOrder.
  • Cross-provider fallback is model-aware: if a fallback provider does not support the current model, a compatible default model is selected automatically.
  • Provider model catalogs are centralized: chat view + model picker now use a shared provider registry for consistent provider/model behavior.
  • Cost-aware routing is available: when enabled, providers and models are ranked by token budget fit, estimated request cost, and observed provider latency.
  • Command parsing is hardened: File/command extraction now handles multiple markdown block patterns without hanging on malformed responses.
  • Token-aware request compaction: request payloads are compacted using ollamaCloud.requestTokenBudget and ollamaCloud.maxConversationMessages.
  • Lifecycle cleanup is enforced: provider heartbeat/config listeners and autocomplete timers are properly disposed to avoid background leaks.
  • Autocomplete is token-aware: cached completions are reused when context matches to reduce repeat requests.
  • Config validation command is built in: run Ollama Cloud: Validate Provider Configuration to detect missing API keys, fallback issues, and model/provider mismatches.

What's New in v0.1.15

🤖 Multi-Provider AI Support (Major Feature)

🌐 Cross-Platform AI Integration

  • Anthropic Claude: Full integration with Claude 3.5 Sonnet, Opus, and Haiku models
  • OpenAI ChatGPT: Support for GPT-4 Turbo, GPT-4, and GPT-3.5 Turbo models
  • Ollama Compatibility: Continued support for local and cloud Ollama models
  • Unified Interface: Seamless switching between providers without changing workflows

⚙️ Flexible Configuration

  • Provider Selection: Choose your preferred AI provider in settings (ollamaCloud.apiProvider)
  • API Key Management: Individual API key configuration for each provider
  • Model Awareness: Provider-specific model selection with appropriate recommendations
  • Backward Compatible: Existing Ollama workflows continue to work unchanged

🎯 Advanced Features

  • Cline-Inspired Architecture: Unified API handler system similar to popular AI assistants
  • Enterprise Ready: Designed to work with multiple AI providers for redundancy and flexibility
  • Cost Optimization: Choose providers based on pricing, performance, or availability
  • Future Proof: Easy to add new providers as they become available

What's New in v0.1.18

🧪 Enhanced Testing Framework

📝 Comprehensive Test Coverage

  • File Editing Tests: Added 50+ new tests covering file creation, editing, error handling, and reliability scenarios
  • Command Execution Tests: Enhanced command executor testing with security, concurrency, and error recovery tests
  • Chat Integration Tests: Added tests for various file creation formats and messaging workflows
  • Cross-Platform Compatibility: Comprehensive testing across Windows, macOS, and Linux environments

🔍 Reliability Verification

  • Error Handling Tests: Comprehensive tests for detailed error message generation and recovery
  • Path Handling Tests: Verification of cross-platform path normalization and special character handling
  • Concurrency Tests: Tests for simultaneous file operations and command executions
  • Edge Case Coverage: Tests for empty content, long paths, invalid inputs, and boundary conditions

🛡️ Security & Safety Tests

  • Command Injection Prevention: Tests verifying safe handling of potentially malicious command inputs
  • Path Sanitization: Tests ensuring proper validation of file paths and directory traversal prevention
  • Resource Management: Tests for proper cleanup and resource handling under various conditions

🛠️ File Editing Reliability Improvements

🔧 Enhanced File Operations

  • Robust File Parsing: Improved regex patterns to handle various AI model output formats for file creation
  • Simplified Approval Flow: Streamlined file editing process for immediate execution without complex approvals
  • Better Error Handling: Comprehensive error logging with detailed messages and stack traces for debugging
  • Cross-Platform Path Support: Enhanced path normalization and workspace context handling

⚡ Improved Command Execution

  • Enhanced Error Recovery: Better command execution with detailed error reporting and workspace context
  • Reliable File Operations: Automatic directory creation and proper path resolution for all file operations
  • Immediate Feedback: Clear success/failure messages for all file editing operations

🎯 System Prompt Optimization

  • Action-Oriented Instructions: Updated AI system prompt with clearer, more actionable file editing guidance
  • Multiple Format Support: AI can now recognize and process various file creation formats
  • Complete Code Generation: Emphasis on providing full, copy-paste ready code without placeholders

⚙️ Performance and User Experience Improvements

🎯 Granular Controls

  • Autocomplete Streaming: Added enableStreamingForAutocomplete setting for granular control over streaming
  • Configurable Delays: New autocompleteDebounceDelay and autocompletePreviewDelay settings
  • Adaptive Performance: Added enableAdaptivePerformance setting for automatic performance tuning
  • Enhanced Configuration: Users can now fine-tune autocomplete behavior to their preferences

📊 Enhanced User Feedback

  • Preview Completions: Implemented immediate feedback with short preview completions
  • Progress Indicators: Added visual progress notifications for longer autocomplete operations
  • Response Time Tracking: Enhanced logging with response time measurements
  • Silent Error Handling: Improved error handling that doesn't interrupt user workflow

⚡ Performance Tuning

  • Memory Pressure Detection: Automatic cache size adjustment based on system memory usage
  • Adaptive Debounce Timing: Dynamic debounce time based on API response performance
  • Resource Management: Enhanced cleanup and timeout handling for optimal resource usage
  • Performance Monitoring: Advanced tracking and optimization of autocomplete performance

What's New in v0.1.22

🔄 Real-Time Editor Synchronization

📝 Dynamic Context Updates

  • Event Listeners: Added listeners for onDidChangeActiveTextEditor, onDidChangeVisibleTextEditors, onDidChangeTextDocument, and onDidSaveTextDocument
  • Real-Time Context: AI context now updates dynamically as you switch editors or modify files
  • Unsaved Changes: Now captures unsaved changes in open editors
  • Smarter Truncation: Prioritizes code near the cursor or imports for better context relevance

⚙️ Reliability Improvements

  • Status Bar Indicator: Added real-time status bar indicator for Ollama server status
  • Heartbeat Mechanism: Implements a 30-second heartbeat to check Ollama server availability
  • Error Visibility: Critical errors (e.g., Ollama server down) now surface in the status bar without interrupting workflow
  • Autocomplete Failures: Subtle warning icon for autocomplete failures

🎯 Enhanced User Experience

  • Workspace Awareness: Better tracking of open files and project structure
  • Context Accuracy: AI responses are now more relevant to your current work
  • Non-Intrusive Feedback: Status updates appear in the status bar without popups

What's New in v0.1.12

🛠️ Enhancement - Tutorial Experience Improvement

👋 Improved Welcome Tutorial Behavior

  • Version-Based Tutorial Display: Tutorial now only shows once per extension version update, not every time VSCode opens
  • Automatic Version Tracking: Extension automatically tracks which version was last shown the tutorial
  • Cleaner User Experience: Removed "Don't Show Again" option since tutorial is now version-aware
  • Updated Messaging: Final tour step clarifies that tutorial only shows per update

🎉 Major Feature Release - Complete Implementation

This release represents a complete overhaul of the extension with enterprise-grade features, comprehensive testing, and production-ready code quality.

🔍 Web Research Capabilities

  • Search the Web: AI can now search the internet using DuckDuckGo (no API key required)
  • Deep Research: Automatically fetches and analyzes content from top search results
  • URL Fetching: Extract and analyze text from any webpage
  • File Downloads: Download files from URLs for analysis
  • Commands:
    • ollama-cloud.searchWeb - Search and add results to chat
    • ollama-cloud.researchTopic - Deep research with content fetching
    • ollama-cloud.fetchUrl - Fetch and analyze URL content

👋 Interactive Walkthrough & Onboarding

  • Welcome Tour: Step-by-step introduction to all features on first use
  • Tips & Tricks: Comprehensive tips panel with categorized advice
  • Keyboard Shortcuts: Quick reference guide for all shortcuts

🔧 Development Mode

  • Hot Reload: Automatic file watching with reload prompts
  • Dev Panel: Quick actions for reload, clear cache, and logs
  • Output Channel: Detailed development logs
  • Statistics: Track active watchers and extension state
  • Commands:
    • ollama-cloud.toggleDevMode - Enable/disable development mode
    • ollama-cloud.showDevStats - Show development statistics

💾 State Manager with Debounced Persistence

  • Fast In-Memory Reads: Instant access to state data
  • Debounced Writes: 500ms batched writes for performance
  • Batch Operations: Efficient bulk state updates
  • Secrets Support: Secure credential storage
  • Statistics Tracking: Monitor cache sizes and pending changes
  • Singleton Pattern: Thread-safe initialization with guards

📓 Jupyter Notebook Support

  • Cell Operations: Explain, fix, optimize, and generate notebook cells
  • Output Integration: Analyzes cell outputs for better context
  • Full Context: Extracts entire notebook structure
  • Commands:
    • ollama-cloud.explainNotebookCell - Explain current notebook cell
    • ollama-cloud.fixNotebookCell - Fix errors in notebook cell
    • ollama-cloud.optimizeNotebookCell - Optimize cell for performance
    • ollama-cloud.generateNotebookCell - Generate cell from description

🔗 URI Handler for Deep Linking

  • Deep Links: Open extension features via URLs
  • Supported URIs:
    • vscode://ollama-cloud/chat?message=Hello&autoSend=true
    • vscode://ollama-cloud/explain?file=/path/to/file.ts&line=10
    • vscode://ollama-cloud/fix?file=/path/to/file.ts&line=10
    • vscode://ollama-cloud/generate?type=test&file=/path/to/file.ts
    • vscode://ollama-cloud/model?select=chat
  • URI Generation: Utilities for creating and sharing deep links

📋 Enhanced Terminal Integration

  • Clipboard-Based Capture: Preserves your clipboard while capturing terminal output
  • Error Analysis: Explain terminal errors with full context
  • Command Suggestions: AI suggests commands based on your description
  • OS-Aware: Adapts to Windows, macOS, and Linux
  • Commands:
    • ollama-cloud.addTerminalOutput - Add selected terminal output to chat
    • ollama-cloud.explainTerminalError - Explain terminal errors
    • ollama-cloud.suggestTerminalCommand - Get command suggestions

🎯 Code Action Provider

  • Right-Click Menu: Access AI features directly from code
  • Actions: Add to Chat, Explain Code, Improve Code, Fix with Ollama
  • Smart Context: Auto-expands 3 lines above/below for better understanding
  • Diagnostic Integration: Fixes errors based on VS Code diagnostics

⚡ Enhanced Autocomplete

  • LRU Cache: Smart caching with max 100 entries
  • Performance Tracking: Monitor cache hit rates and request counts
  • Automatic Cleanup: Periodic maintenance every 60 seconds
  • Optimized Debounce: 250ms for faster responses
  • Statistics: getStats() method for debugging

🧪 Testing & Quality

100% Test Coverage

  • 215 Tests Passing: All unit tests passing with 0 failures
  • Test Suites:
    • OllamaCloudClient (token usage, sessions, context management)
    • SessionManager (persistence, retrieval, validation)
    • CommandExecutor (cross-platform, error handling, special cases)
    • ChatViewProvider (message handling, lifecycle)
    • FileEditor (path validation, content fixes)
    • AutocompleteProvider (caching, performance, languages)
    • StateManager (debounced persistence, batch operations)
    • Integration tests

📦 Technical Improvements

Architecture

  • Singleton Patterns: Thread-safe initialization for all managers
  • Debounced Operations: Performance optimization for state writes
  • LRU Caching: Memory-efficient caching with automatic eviction
  • Proper Disposal: Resource cleanup on deactivation
  • Type Safety: Full TypeScript strict mode compliance

Code Quality

  • JSDoc Comments: Comprehensive documentation
  • Error Handling: Graceful failure recovery
  • Performance Monitoring: Built-in metrics and statistics
  • Resource Management: Proper cleanup and disposal patterns

Bundle Size

  • 404 KiB: Optimized webpack bundle
  • 9 New Modules: Web research, walkthrough, dev mode, notebook, URI handler, state manager, terminal integration, code actions, enhanced autocomplete

What's New in v0.1.10

Enhanced Token Usage Display

  • Top-Position Token Usage: Token usage now appears prominently at the top of the chat interface
  • Model-Specific Styling: Local models show green styling, cloud models show blue styling with gradient backgrounds
  • Visual Progress Bars: Real-time token usage tracking with color-coded progress bars (green/yellow/red)
  • Monthly Usage Tracking: Monitor your Ollama Cloud API usage with detailed breakdowns

Advanced File Editing Features

  • Model-Specific Content Fixes: Automatically fixes common issues with different AI models:
    • Removes escape characters for Gemini, Llama, Mistral models
    • Strips markdown codeblock markers for DeepSeek, Llama, Mistral models
    • Converts HTML entities for DeepSeek models
    • Cleans up whitespace for models like "minsteral"/"minstral"
    • Handles JSON and YAML file formatting
  • Enhanced File Path Validation: Improved security with better pattern matching and validation
  • Better Error Handling: More robust error messages and validation

Improved User Interface

  • Action Approval Cards: New styled cards for file edits and command execution with gradient backgrounds
  • File Read Notifications: Visual indicators when files are being read
  • Enhanced Environment Info: Better workspace information display with colored chips
  • Improved Markdown Rendering: Better table support and formatting

Advanced Context Management

  • File Context Tracking: Tracks files that are read, edited, or mentioned
  • Task Context Awareness: Maintains context about files created/modified during tasks
  • Session Restoration: Improved session management and restoration

Code Quality Improvements

  • TypeScript Best Practices: Better type definitions and error handling
  • Code Organization: Cleaner separation of concerns
  • Performance Optimizations: More efficient file operations and UI updates

What's New in v0.1.9

Bug Fixes

  • Session Restore Now Works: Restoring a previous session now properly displays all chat messages in the chat window
  • Default Model Settings: The ollamaCloud.chatModel setting now properly sets the selected model in the dropdown
  • Premium Models Indicator: Premium models (70B+, Mixtral, Claude, GPT-4, etc.) are clearly marked with a 💎 icon

What's New in v0.1.8

Action Approval UI (Cline-Style)

  • "Ollama wants to edit {filename}": Beautiful purple/indigo gradient cards appear when AI suggests file edits
  • "Ollama wants to run command": Amber/yellow gradient cards for command execution requests
  • "Ollama is reading file": Blue notification cards when AI reads files
  • Apply/Skip Buttons: User-friendly buttons to approve or skip each action
  • Task Completion Banner: Green gradient banner shows summary when all actions complete

Model Dropdown Enhancements

  • Source Indicators: 💻 for local models, ☁️ for cloud models
  • Premium Model Indicator: 💎 badge for large/expensive models (70B+, Mixtral, Claude, GPT-4)
  • Smart Sorting: Local models appear first in the dropdown

Custom System Prompt

  • New Setting: ollamaCloud.customSystemPrompt with dynamic placeholders
  • Placeholders: {{OS}}, {{SHELL}}, {{WORKSPACE}}, {{WORKSPACE_PATH}}, {{FILE_TREE}}, {{NPM_SCRIPTS}}, {{MODE}}
  • Full Control: Replace the entire system prompt or leave empty for default

Session Restore Improvements

  • No Auto-Restore: Sessions no longer automatically restore on startup
  • User Choice: Prompt with "Restore Session" and "Start Fresh" buttons

Other Improvements

  • Increased Timeout: Default timeout increased from 30s to 120s for large models
  • No Placeholder Code: AI now provides complete, copy-paste ready code (no more "// ... rest of code")

What's New in v0.1.4

Cross-Platform Command Execution

  • Full Cross-Platform Support: Commands now work seamlessly on Windows, macOS, and Linux
  • Platform-Aware Shell Selection: Automatically uses the appropriate shell (PowerShell on Windows, zsh on macOS, bash on Linux)
  • Smart Command Chaining: Uses correct command separators for each platform (; for PowerShell, && for Unix shells)

What's New in v0.1.3

Improved AI Response Reliability

  • Official Ollama SDK: Now uses the official ollama npm package for better compatibility and reliability
  • Streaming Responses: See AI output in real-time as it's generated (configurable)
  • Context Window Management: Proper num_ctx parameter support (default 32768) for longer conversations
  • Retry Logic: Automatic retry with exponential backoff for rate-limited requests
  • Request Cancellation: Working Cancel button to abort long-running requests
  • Actual Token Counts: Real token usage from API instead of estimates

New Configuration Options

  • enableStreaming - Toggle streaming responses on/off
  • contextWindow - Set the context window size (2048-131072)
  • requestTimeout - Set request timeout in milliseconds (5000-600000)

Installation

From VSIX (Local Installation)

  1. Download the .vsix file
  2. Open VSCode
  3. Go to Extensions (Ctrl+Shift+X)
  4. Click the "..." menu at the top
  5. Select "Install from VSIX..."
  6. Choose the downloaded file

From Source

  1. Clone this repository
  2. Run npm install
  3. Run npm run compile
  4. Press F5 to open a new VSCode window with the extension loaded

Setup

Option 1: Ollama Cloud (Recommended for beginners)

  1. Get Ollama Cloud API Key

    • Sign up at ollama.com
    • Go to your account settings and generate an API key
  2. Configure the Extension

    • Open VSCode Settings (Ctrl+,)
    • Search for "Ollama Cloud"
    • Enter your API key in ollamaCloud.cloudApiKey

Option 2: Local Ollama

  1. Install Ollama

    • Download from ollama.com
    • Run ollama serve to start the local server
  2. Pull a Model

    ollama pull llama3.1
    
  3. Configure the Extension

    • The extension will automatically detect local models
    • No API key needed for local models

Option 3: Both (Hybrid)

Use both local and cloud models! The extension automatically routes requests to the appropriate endpoint based on where each model is available.

Option 4: Grok (xAI)

  1. Get your API key from console.x.ai.
  2. Set ollamaCloud.apiProvider to grok.
  3. Add your key to ollamaCloud.grokApiKey.
  4. Optional: set ollamaCloud.grokBaseUrl if you use a proxy/gateway.

Provider-Specific Setup (ChatGPT, Claude, Grok)

  1. Open VS Code settings and search ollamaCloud.apiProvider.
  2. Choose one provider:
    • openai or chatgpt for OpenAI models
    • anthropic or claude for Claude models
    • grok for xAI Grok models
  3. Set the matching API key:
    • ollamaCloud.openAiApiKey
    • ollamaCloud.anthropicApiKey
    • ollamaCloud.grokApiKey
  4. Optional for enterprise/proxy setups:
    • ollamaCloud.openAiBaseUrl
    • ollamaCloud.anthropicBaseUrl
    • ollamaCloud.grokBaseUrl
  5. Set ollamaCloud.providerFallbackOrder so requests can fail over if your primary provider is unavailable.

Production-Ready Setup Checklist

  1. Set ollamaCloud.autoApprove to false unless you explicitly want unattended edits/commands.
  2. Keep ollamaCloud.allowUnsafeCommands at false for default command safety guardrails.
  3. Set ollamaCloud.requestTimeout high enough for your largest model (for example 120000 or 180000).
  4. Configure ollamaCloud.providerFallbackOrder with at least one backup provider.
  5. Cap prompt spend with ollamaCloud.requestTokenBudget and ollamaCloud.maxConversationMessages.
  6. For faster/cheaper suggestions, use a smaller ollamaCloud.autocompleteModel.
  7. Leave ollamaCloud.enableAdaptivePerformance enabled so debounce/cache settings can adapt over time.

Token Efficiency Recommendations

  1. Lower ollamaCloud.requestTokenBudget if responses are too expensive.
  2. Lower ollamaCloud.maxConversationMessages to keep only the most recent context.
  3. Disable ollamaCloud.includeFileContext when not needed for simple questions.
  4. Use smaller models for autocomplete and larger models only for complex generation/review tasks.

Cost-Aware Routing

  1. Enable ollamaCloud.enableCostAwareRouting.
  2. Set ollamaCloud.routingPreference:
    • balanced for mixed cost/latency optimization
    • lowCost to prioritize cheaper requests
    • lowLatency to prioritize faster providers
  3. Optional: set ollamaCloud.maxEstimatedRequestCostUsd as a soft per-request cap.

Usage

Opening the Chat

  • Click the Ollama Cloud icon in the Activity Bar (left sidebar)
  • Or use the keyboard shortcut: Ctrl+Shift+O (Windows/Linux) or Cmd+Shift+O (Mac)
  • Or open Command Palette (Ctrl+Shift+P) and run "Ollama Cloud: Open Chat"

Validate Your Setup

Run Ollama Cloud: Validate Provider Configuration from the Command Palette to quickly verify:

  • primary provider API key availability
  • fallback provider key coverage
  • fallback order duplicate/alias cleanup
  • configured chat model compatibility with selected provider

Chatting with the AI

Simply type your question or request in the chat input and press Enter. The AI can help with:

  • Writing new code
  • Explaining existing code
  • Debugging errors
  • Refactoring code
  • Creating new files
  • Running commands
  • And much more!

Operating Modes

ACT Mode (Default)

  • AI can suggest file edits and commands.
  • You approve each suggested action before it runs (unless auto-approve is enabled).
  • Best for getting things done

PLAN Mode

  • AI explains what should be done without executing
  • Great for understanding complex tasks
  • Use for learning and planning

File Editing

When the AI suggests file changes, it will format them like this:

// File: src/example.js
function hello() {
  console.log("Hello, World!");
}

Approval Workflow

By default, you'll be prompted to approve changes before they're applied. Suggestions appear as clickable cards in chat with options to Apply or Skip.

If you'd prefer to skip manual approval, enable Auto-Approve (ollamaCloud.autoApprove). Warning: with auto-approve enabled, AI can modify files and execute commands without per-action confirmation.

Command Execution

When the AI suggests commands, they'll be formatted like:

npm install axios

Approval Workflow

Command suggestions appear as clickable cards. By default, you explicitly approve each command before execution.

If you'd prefer to skip manual approval, enable Auto-Approve (ollamaCloud.autoApprove). Warning: commands run immediately in this mode.

Configuration

Setting Description Default
ollamaCloud.cloudApiKey Your Ollama Cloud API key (empty)
ollamaCloud.anthropicApiKey Anthropic API key for Claude (empty)
ollamaCloud.openAiApiKey OpenAI API key for ChatGPT/GPT models (empty)
ollamaCloud.grokApiKey xAI Grok API key (empty)
ollamaCloud.grokBaseUrl Grok API base URL https://api.x.ai/v1
ollamaCloud.localEndpoint Local Ollama server URL http://localhost:11434
ollamaCloud.apiProvider Primary provider for chat/completions ollama
ollamaCloud.providerFallbackOrder Ordered fallback providers for transient failures ["ollama","openai","anthropic","grok"]
ollamaCloud.enableCostAwareRouting Rank providers/models using cost + latency telemetry false
ollamaCloud.routingPreference Cost-aware routing strategy (balanced, lowCost, lowLatency) balanced
ollamaCloud.maxEstimatedRequestCostUsd Soft per-request cost cap in USD (0 disables) 0
ollamaCloud.chatModel AI model for chat llama3.1
ollamaCloud.autocompleteModel AI model for autocomplete ministral-3:3b
ollamaCloud.defaultMode Default operating mode act
ollamaCloud.customSystemPrompt Custom system prompt with placeholders (empty)
ollamaCloud.temperature Response creativity (0-2) 0.7
ollamaCloud.maxTokens Maximum response length 4096
ollamaCloud.requestTokenBudget Approximate max prompt/context tokens per request 16000
ollamaCloud.maxConversationMessages Max recent messages kept before compaction 16
ollamaCloud.contextWindow Context window size 32768
ollamaCloud.requestTimeout Request timeout (ms) 120000
ollamaCloud.enableStreaming Enable streaming responses true
ollamaCloud.autoApprove Auto-approve AI actions false
ollamaCloud.allowUnsafeCommands Allow high-risk shell commands (not recommended) false
ollamaCloud.showDiff Show diff before applying changes true
ollamaCloud.enableAutocomplete Enable AI code completion true
ollamaCloud.includeFileContext Automatically include context from open files in chat messages true
ollamaCloud.enableCodeLens Show AI action buttons in code true
ollamaCloud.enableStreamingForAutocomplete Enable streaming for autocomplete requests false
ollamaCloud.autocompleteDebounceDelay Delay before sending autocomplete requests (ms) 250
ollamaCloud.autocompletePreviewDelay Delay before showing autocomplete preview (ms) 1000
ollamaCloud.enableAdaptivePerformance Enable adaptive performance tuning true

Custom System Prompt Placeholders

When using customSystemPrompt, you can use these placeholders that will be replaced with actual values:

Placeholder Description
{{OS}} Operating system (Windows, macOS, Linux)
{{SHELL}} Shell type (PowerShell, Bash, etc.)
{{WORKSPACE}} Current workspace name
{{WORKSPACE_PATH}} Full path to workspace
{{FILE_TREE}} Project file structure
{{NPM_SCRIPTS}} Available npm scripts
{{MODE}} Current mode (ACT or PLAN)

Project Context Files

The AI can automatically read project-specific context from special markdown files in your workspace. Create one of these files to provide the AI with project-specific information. The extension supports multiple popular AI context formats.

Supported File Names (Ordered by Priority)

Ollama Cloud Native Formats

  1. .ollamacloud.md (Primary - Highest priority)
  2. .ollamacloud-context.md

GitHub Copilot Compatible Formats

  1. .github/copilot-instructions.md (GitHub standard location)
  2. .github/copilot_instructions.md
  3. .copilot-instructions.md
  4. .copilot_instructions.md
  5. copilot-instructions.md

Anthropic Claude Compatible Formats

  1. .claude/context.md
  2. .claude/instructions.md
  3. claude-context.md

Generic Formats

  1. .project-context.md
  2. .context.md
  3. PROJECT.md
  4. README.context.md
  5. PROJECT_CONTEXT.md

Format Compatibility

This extension automatically reads context files used by other AI assistants:

  • GitHub Copilot: Supports .github/copilot-instructions.md format
  • Anthropic Claude: Supports .claude/context.md format
  • Generic AI tools: Supports common context file names

You can use existing context files from other AI tools, or create a new one tailored for Ollama Cloud.

Example Project Context Files

Ollama Cloud Format (.ollamacloud.md)

Create .ollamacloud.md in your project root:

# Project Information
- **Project Name**: MyApp
- **Framework**: React with TypeScript
- **Build Tool**: Vite
- **CSS Framework**: Tailwind CSS
- **Backend API**: REST API at /api/v1

# Important Guidelines
- Always use functional components with React hooks
- Follow the existing folder structure pattern
- Use Tailwind classes for styling instead of CSS files
- Maintain consistent error handling with try/catch blocks
- Keep components small and focused (max 150 lines)

# Architecture Notes
- Authentication is handled via JWT tokens
- State management uses React Context + useReducer
- All API calls go through the `/src/api/client.ts` wrapper
- Environment variables are prefixed with `VITE_`

# Common Patterns to Avoid
- ❌ Don't use Redux (we migrated away from it)
- ❌ Don't create new CSS files (use Tailwind only)
- ❌ Don't use class components (legacy only)

GitHub Copilot Format (.github/copilot-instructions.md)

# Copilot Instructions for This Project

## Coding Standards
- Follow the existing code style
- Use TypeScript with strict typing
- Include JSDoc comments for exported functions
- Handle errors gracefully with try/catch

## Project Structure
- `/src` - Source code
- `/test` - Unit tests
- `/dist` - Built output
- Configuration files in root

## Key Dependencies
- vscode - VS Code Extension API
- ollama - Ollama client library
- axios - HTTP client for API calls

Claude Format (.claude/context.md)

# Claude Context

This project is a VS Code extension for AI-assisted coding.

## Core Principles
- User privacy is paramount
- Transparency in all AI interactions
- Security through explicit approval workflows
- Performance through caching and debouncing

## Technical Constraints
- Must work with both local and cloud AI models
- All file operations require user confirmation
- Network calls must handle timeouts gracefully
- Extension must work offline when possible

Best Practices

  1. Single Context File: Use only one context file to avoid conflicts
  2. Keep it Concise: Focus on the most important information
  3. Regular Updates: Keep context files current with project changes
  4. Team Consensus: Ensure team agrees on context guidelines
  5. Security Review: Review context files for sensitive information

The AI will automatically include this context information in all conversations to provide more accurate and project-specific responses.

Available Models

Cloud Models (Ollama Cloud)

  • llama3.1 - Latest Llama model (recommended)
  • llama3.2 - Llama 3.2
  • ministral-3:3b - Fast, efficient for autocomplete
  • gemma3:4b - Google's Gemma 3

Local Models (requires local Ollama)

  • codellama - Specialized for coding
  • mistral - Fast and efficient
  • mixtral - Mixture of experts model
  • qwen2.5-coder - Specialized coding model
  • phi3 - Microsoft's Phi-3

Keyboard Shortcuts

Shortcut Action
Ctrl+Shift+O Open Chat
Ctrl+Shift+N New Task
Ctrl+K Inline Chat (with selection)
Ctrl+Shift+E Explain Code
Ctrl+Shift+T Generate Tests
Ctrl+Shift+F Fix Code
Ctrl+Shift+R Review Code
Ctrl+Shift+M Modernize Code
Enter Send message
Shift+Enter New line in message

Commands

Chat & Core

  • Ollama Cloud: Open Chat - Open the chat interface
  • Ollama Cloud: New Task - Start a new conversation
  • Ollama Cloud: Clear History - Clear chat history
  • Ollama Cloud: Select Model - Choose a different AI model

Code Actions

  • Ollama Cloud: Explain Code - Explain selected code
  • Ollama Cloud: Fix Code - Fix issues in selected code
  • Ollama Cloud: Improve Code - Suggest code improvements
  • Ollama Cloud: Add to Chat - Add selected code to chat
  • Ollama Cloud: Generate Tests - Generate tests for selected code
  • Ollama Cloud: Review Code - Get a code review
  • Ollama Cloud: Modernize Code - Update code to modern standards

Terminal Integration

  • Ollama Cloud: Add Terminal Output - Add selected terminal output to chat
  • Ollama Cloud: Explain Terminal Error - Explain terminal errors
  • Ollama Cloud: Suggest Terminal Command - Get command suggestions

Web Research

  • Ollama Cloud: Search Web - Search the internet and add results to chat
  • Ollama Cloud: Research Topic - Deep research with content fetching
  • Ollama Cloud: Fetch URL - Fetch and analyze URL content

Jupyter Notebooks

  • Ollama Cloud: Explain Notebook Cell - Explain current notebook cell
  • Ollama Cloud: Fix Notebook Cell - Fix errors in notebook cell
  • Ollama Cloud: Optimize Notebook Cell - Optimize cell for performance
  • Ollama Cloud: Generate Notebook Cell - Generate cell from description

Walkthrough & Help

  • Ollama Cloud: Show Welcome - Show welcome tour
  • Ollama Cloud: Show Tips - Show tips and tricks
  • Ollama Cloud: Show Shortcuts - Show keyboard shortcuts

Development

  • Ollama Cloud: Toggle Dev Mode - Enable/disable development mode
  • Ollama Cloud: Show Dev Stats - Show development statistics

Tips

  1. Be Specific: The more specific your request, the better the AI can help
  2. Provide Context: Mention file names, error messages, or relevant code
  3. Review Changes: Always review AI-suggested changes before applying
  4. Use New Task: Start a new task for unrelated questions to maintain context
  5. Experiment with Models: Different models excel at different tasks
  6. Adjust Timeout: Increase requestTimeout for larger models or slower connections
  7. Use Streaming: Keep streaming enabled to see responses as they generate

Troubleshooting

"Invalid API key" Error

  • Verify your API key in settings
  • Make sure you have an active Ollama Cloud account

"Request timed out" Error

  • Increase requestTimeout in settings (default 120000 ms)
  • Try a smaller/faster model
  • Check your internet connection

Extension Not Loading

  • Check the Output panel (View → Output → Ollama Cloud)
  • Try reloading VSCode (Ctrl+Shift+P → "Reload Window")

Slow Responses

  • Try a smaller model (e.g., ministral-3:3b instead of llama3.1)
  • Enable streaming to see partial responses
  • Check your internet connection
  • Reduce maxTokens in settings

Local Ollama Not Detected

  • Make sure Ollama is running (ollama serve)
  • Check the localEndpoint setting matches your Ollama server

Tests Fail With code: bad option Or Node CLI Help

  • Ensure ELECTRON_RUN_AS_NODE is not exported in your shell when running extension tests.
  • The repository test runner (src/test/runTest.ts) now unsets it automatically before launching VS Code.

Privacy & Security

  • Your code is sent to Ollama Cloud or your local Ollama for processing
  • API keys are stored locally in VSCode settings
  • No data is stored by this extension beyond session persistence
  • Review Ollama's privacy policy for cloud usage details

Development

Building from Source

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Watch for changes
npm run watch

# Package extension
npm run package

Build, Test, Package, And Bump Version Automatically

Use the project release script:

# Patch release (default)
./build.sh

# Explicit bump type
./build.sh patch
./build.sh minor
./build.sh major

# Build/test/package without version bump
./build.sh --no-bump

What build.sh does:

  1. Installs dependencies when missing.
  2. Runs reliability checks (npm run validate:reliability) to verify chat view wiring, required settings, and command contributions.
  3. Runs lint and compile.
  4. Optionally runs compile-tests + tests in strict mode (./build.sh --strict).
  5. Bumps package.json version (unless --no-bump).
  6. Builds extension assets and creates a .vsix package named ollama-cloud-<version>.vsix.

Reliability Gates

# Validate manifest and extension wiring contracts
npm run validate:reliability

# Individual checks
npm run validate:manifest
npm run validate:contracts

The reliability checks fail fast if critical UX contracts regress, including:

  • Missing or mismatched chat view contribution IDs
  • Missing required settings (chat/autocomplete/provider)
  • Wrong default provider (ollamaCloud.apiProvider must default to ollama)
  • Missing core commands (open chat, autocomplete model selection, configuration validation)

Running Tests

npm test

Recommended Developer Workflow

  1. Run npm run validate:reliability before each commit.
  2. Use ./build.sh --no-bump for local packaging checks.
  3. Use ./build.sh --strict --no-bump when validating test compilation/execution.
  4. Use ./build.sh patch|minor|major only when preparing a release artifact.

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - See LICENSE file for details

Credits

Inspired by the Cline VSCode extension. Built with ❤️ for the developer community.

Support

  • Report issues on GitLab
  • Visit jkagidesigns.com for more projects

Note: This extension works with both Ollama Cloud (requires API key) and local Ollama (free, requires local installation). Visit ollama.com to get started.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft