Ollama Cloud VSCode Extension
An AI-powered coding assistant for Visual Studio Code with multi-provider support. Works with Ollama Cloud, local Ollama, Anthropic Claude, OpenAI ChatGPT, Grok, and more - similar to Cline. This extension provides intelligent code assistance, file editing, and command execution capabilities across multiple AI providers.
Features
- 🤖 Multi-Provider AI Chat: Interactive chat with Ollama (local/cloud), Anthropic Claude, OpenAI ChatGPT, Grok, and more
- 📝 Smart Code Editing: AI can read, write, and modify files with your approval
- 🔧 Command Execution: Execute terminal commands suggested by the AI
- 🔄 Diff View: Review changes before applying them
- 🎯 Multiple Models: Support for various models across all providers (Llama, Claude 3.5, GPT-4, etc.)
- ⚡ Real-time Streaming: See AI responses as they're generated
- 🎨 VSCode Integration: Native VSCode UI with dark/light theme support
- 🧠 Enhanced Context Awareness: Tracks tasks, files, and commands for better continuity
- 💾 Session Persistence: Auto-saves and restores conversations across VSCode restarts
- 📊 Usage Tracking: Monitor API usage with visual indicators for each provider
- 🗂️ Workspace Indexing: Automatically understands your project structure
- 🔁 Automatic Retry: Smart retry logic with exponential backoff for rate limits
- ⏱️ Configurable Timeouts: Adjust request timeouts for slower connections
- 🔄 Provider Switching: Easily switch between AI providers based on your needs
- 🧯 Provider Failover: Automatically fail over to backup providers when the primary provider is unavailable
- 💸 Token Budget Controls: Enforce per-request token budgets to reduce waste and cost
Current Behavior Notes (v0.1.28)
- Approval-first by default: In
ACT mode, suggested file edits/commands appear as approval cards in chat and are not executed until you approve them.
- Auto-approve is optional: If
ollamaCloud.autoApprove is enabled, suggested actions execute immediately.
- Code actions route into chat: Right-click actions (Add to Chat, Explain, Improve, Fix) now reliably focus the chat and pass content into the panel.
- Provider selection is live: Chat requests use the current
ollamaCloud.apiProvider setting at send time.
- Provider failover is configurable: transient provider failures can automatically fail over using
ollamaCloud.providerFallbackOrder.
- Cross-provider fallback is model-aware: if a fallback provider does not support the current model, a compatible default model is selected automatically.
- Provider model catalogs are centralized: chat view + model picker now use a shared provider registry for consistent provider/model behavior.
- Cost-aware routing is available: when enabled, providers and models are ranked by token budget fit, estimated request cost, and observed provider latency.
- Command parsing is hardened: File/command extraction now handles multiple markdown block patterns without hanging on malformed responses.
- Token-aware request compaction: request payloads are compacted using
ollamaCloud.requestTokenBudget and ollamaCloud.maxConversationMessages.
- Lifecycle cleanup is enforced: provider heartbeat/config listeners and autocomplete timers are properly disposed to avoid background leaks.
- Autocomplete is token-aware: cached completions are reused when context matches to reduce repeat requests.
- Config validation command is built in: run
Ollama Cloud: Validate Provider Configuration to detect missing API keys, fallback issues, and model/provider mismatches.
What's New in v0.1.15
🤖 Multi-Provider AI Support (Major Feature)
- Anthropic Claude: Full integration with Claude 3.5 Sonnet, Opus, and Haiku models
- OpenAI ChatGPT: Support for GPT-4 Turbo, GPT-4, and GPT-3.5 Turbo models
- Ollama Compatibility: Continued support for local and cloud Ollama models
- Unified Interface: Seamless switching between providers without changing workflows
⚙️ Flexible Configuration
- Provider Selection: Choose your preferred AI provider in settings (
ollamaCloud.apiProvider)
- API Key Management: Individual API key configuration for each provider
- Model Awareness: Provider-specific model selection with appropriate recommendations
- Backward Compatible: Existing Ollama workflows continue to work unchanged
🎯 Advanced Features
- Cline-Inspired Architecture: Unified API handler system similar to popular AI assistants
- Enterprise Ready: Designed to work with multiple AI providers for redundancy and flexibility
- Cost Optimization: Choose providers based on pricing, performance, or availability
- Future Proof: Easy to add new providers as they become available
What's New in v0.1.18
🧪 Enhanced Testing Framework
📝 Comprehensive Test Coverage
- File Editing Tests: Added 50+ new tests covering file creation, editing, error handling, and reliability scenarios
- Command Execution Tests: Enhanced command executor testing with security, concurrency, and error recovery tests
- Chat Integration Tests: Added tests for various file creation formats and messaging workflows
- Cross-Platform Compatibility: Comprehensive testing across Windows, macOS, and Linux environments
🔍 Reliability Verification
- Error Handling Tests: Comprehensive tests for detailed error message generation and recovery
- Path Handling Tests: Verification of cross-platform path normalization and special character handling
- Concurrency Tests: Tests for simultaneous file operations and command executions
- Edge Case Coverage: Tests for empty content, long paths, invalid inputs, and boundary conditions
🛡️ Security & Safety Tests
- Command Injection Prevention: Tests verifying safe handling of potentially malicious command inputs
- Path Sanitization: Tests ensuring proper validation of file paths and directory traversal prevention
- Resource Management: Tests for proper cleanup and resource handling under various conditions
🛠️ File Editing Reliability Improvements
🔧 Enhanced File Operations
- Robust File Parsing: Improved regex patterns to handle various AI model output formats for file creation
- Simplified Approval Flow: Streamlined file editing process for immediate execution without complex approvals
- Better Error Handling: Comprehensive error logging with detailed messages and stack traces for debugging
- Cross-Platform Path Support: Enhanced path normalization and workspace context handling
⚡ Improved Command Execution
- Enhanced Error Recovery: Better command execution with detailed error reporting and workspace context
- Reliable File Operations: Automatic directory creation and proper path resolution for all file operations
- Immediate Feedback: Clear success/failure messages for all file editing operations
🎯 System Prompt Optimization
- Action-Oriented Instructions: Updated AI system prompt with clearer, more actionable file editing guidance
- Multiple Format Support: AI can now recognize and process various file creation formats
- Complete Code Generation: Emphasis on providing full, copy-paste ready code without placeholders
🎯 Granular Controls
- Autocomplete Streaming: Added
enableStreamingForAutocomplete setting for granular control over streaming
- Configurable Delays: New
autocompleteDebounceDelay and autocompletePreviewDelay settings
- Adaptive Performance: Added
enableAdaptivePerformance setting for automatic performance tuning
- Enhanced Configuration: Users can now fine-tune autocomplete behavior to their preferences
📊 Enhanced User Feedback
- Preview Completions: Implemented immediate feedback with short preview completions
- Progress Indicators: Added visual progress notifications for longer autocomplete operations
- Response Time Tracking: Enhanced logging with response time measurements
- Silent Error Handling: Improved error handling that doesn't interrupt user workflow
- Memory Pressure Detection: Automatic cache size adjustment based on system memory usage
- Adaptive Debounce Timing: Dynamic debounce time based on API response performance
- Resource Management: Enhanced cleanup and timeout handling for optimal resource usage
- Performance Monitoring: Advanced tracking and optimization of autocomplete performance
What's New in v0.1.22
🔄 Real-Time Editor Synchronization
📝 Dynamic Context Updates
- Event Listeners: Added listeners for
onDidChangeActiveTextEditor, onDidChangeVisibleTextEditors, onDidChangeTextDocument, and onDidSaveTextDocument
- Real-Time Context: AI context now updates dynamically as you switch editors or modify files
- Unsaved Changes: Now captures unsaved changes in open editors
- Smarter Truncation: Prioritizes code near the cursor or imports for better context relevance
⚙️ Reliability Improvements
- Status Bar Indicator: Added real-time status bar indicator for Ollama server status
- Heartbeat Mechanism: Implements a 30-second heartbeat to check Ollama server availability
- Error Visibility: Critical errors (e.g., Ollama server down) now surface in the status bar without interrupting workflow
- Autocomplete Failures: Subtle warning icon for autocomplete failures
🎯 Enhanced User Experience
- Workspace Awareness: Better tracking of open files and project structure
- Context Accuracy: AI responses are now more relevant to your current work
- Non-Intrusive Feedback: Status updates appear in the status bar without popups
What's New in v0.1.12
🛠️ Enhancement - Tutorial Experience Improvement
👋 Improved Welcome Tutorial Behavior
- Version-Based Tutorial Display: Tutorial now only shows once per extension version update, not every time VSCode opens
- Automatic Version Tracking: Extension automatically tracks which version was last shown the tutorial
- Cleaner User Experience: Removed "Don't Show Again" option since tutorial is now version-aware
- Updated Messaging: Final tour step clarifies that tutorial only shows per update
🎉 Major Feature Release - Complete Implementation
This release represents a complete overhaul of the extension with enterprise-grade features, comprehensive testing, and production-ready code quality.
🔍 Web Research Capabilities
- Search the Web: AI can now search the internet using DuckDuckGo (no API key required)
- Deep Research: Automatically fetches and analyzes content from top search results
- URL Fetching: Extract and analyze text from any webpage
- File Downloads: Download files from URLs for analysis
- Commands:
ollama-cloud.searchWeb - Search and add results to chat
ollama-cloud.researchTopic - Deep research with content fetching
ollama-cloud.fetchUrl - Fetch and analyze URL content
👋 Interactive Walkthrough & Onboarding
- Welcome Tour: Step-by-step introduction to all features on first use
- Tips & Tricks: Comprehensive tips panel with categorized advice
- Keyboard Shortcuts: Quick reference guide for all shortcuts
🔧 Development Mode
- Hot Reload: Automatic file watching with reload prompts
- Dev Panel: Quick actions for reload, clear cache, and logs
- Output Channel: Detailed development logs
- Statistics: Track active watchers and extension state
- Commands:
ollama-cloud.toggleDevMode - Enable/disable development mode
ollama-cloud.showDevStats - Show development statistics
💾 State Manager with Debounced Persistence
- Fast In-Memory Reads: Instant access to state data
- Debounced Writes: 500ms batched writes for performance
- Batch Operations: Efficient bulk state updates
- Secrets Support: Secure credential storage
- Statistics Tracking: Monitor cache sizes and pending changes
- Singleton Pattern: Thread-safe initialization with guards
📓 Jupyter Notebook Support
- Cell Operations: Explain, fix, optimize, and generate notebook cells
- Output Integration: Analyzes cell outputs for better context
- Full Context: Extracts entire notebook structure
- Commands:
ollama-cloud.explainNotebookCell - Explain current notebook cell
ollama-cloud.fixNotebookCell - Fix errors in notebook cell
ollama-cloud.optimizeNotebookCell - Optimize cell for performance
ollama-cloud.generateNotebookCell - Generate cell from description
🔗 URI Handler for Deep Linking
- Deep Links: Open extension features via URLs
- Supported URIs:
vscode://ollama-cloud/chat?message=Hello&autoSend=true
vscode://ollama-cloud/explain?file=/path/to/file.ts&line=10
vscode://ollama-cloud/fix?file=/path/to/file.ts&line=10
vscode://ollama-cloud/generate?type=test&file=/path/to/file.ts
vscode://ollama-cloud/model?select=chat
- URI Generation: Utilities for creating and sharing deep links
📋 Enhanced Terminal Integration
- Clipboard-Based Capture: Preserves your clipboard while capturing terminal output
- Error Analysis: Explain terminal errors with full context
- Command Suggestions: AI suggests commands based on your description
- OS-Aware: Adapts to Windows, macOS, and Linux
- Commands:
ollama-cloud.addTerminalOutput - Add selected terminal output to chat
ollama-cloud.explainTerminalError - Explain terminal errors
ollama-cloud.suggestTerminalCommand - Get command suggestions
🎯 Code Action Provider
- Right-Click Menu: Access AI features directly from code
- Actions: Add to Chat, Explain Code, Improve Code, Fix with Ollama
- Smart Context: Auto-expands 3 lines above/below for better understanding
- Diagnostic Integration: Fixes errors based on VS Code diagnostics
⚡ Enhanced Autocomplete
- LRU Cache: Smart caching with max 100 entries
- Performance Tracking: Monitor cache hit rates and request counts
- Automatic Cleanup: Periodic maintenance every 60 seconds
- Optimized Debounce: 250ms for faster responses
- Statistics:
getStats() method for debugging
🧪 Testing & Quality
100% Test Coverage
- 215 Tests Passing: All unit tests passing with 0 failures
- Test Suites:
- OllamaCloudClient (token usage, sessions, context management)
- SessionManager (persistence, retrieval, validation)
- CommandExecutor (cross-platform, error handling, special cases)
- ChatViewProvider (message handling, lifecycle)
- FileEditor (path validation, content fixes)
- AutocompleteProvider (caching, performance, languages)
- StateManager (debounced persistence, batch operations)
- Integration tests
📦 Technical Improvements
Architecture
- Singleton Patterns: Thread-safe initialization for all managers
- Debounced Operations: Performance optimization for state writes
- LRU Caching: Memory-efficient caching with automatic eviction
- Proper Disposal: Resource cleanup on deactivation
- Type Safety: Full TypeScript strict mode compliance
Code Quality
- JSDoc Comments: Comprehensive documentation
- Error Handling: Graceful failure recovery
- Performance Monitoring: Built-in metrics and statistics
- Resource Management: Proper cleanup and disposal patterns
Bundle Size
- 404 KiB: Optimized webpack bundle
- 9 New Modules: Web research, walkthrough, dev mode, notebook, URI handler, state manager, terminal integration, code actions, enhanced autocomplete
What's New in v0.1.10
Enhanced Token Usage Display
- Top-Position Token Usage: Token usage now appears prominently at the top of the chat interface
- Model-Specific Styling: Local models show green styling, cloud models show blue styling with gradient backgrounds
- Visual Progress Bars: Real-time token usage tracking with color-coded progress bars (green/yellow/red)
- Monthly Usage Tracking: Monitor your Ollama Cloud API usage with detailed breakdowns
Advanced File Editing Features
- Model-Specific Content Fixes: Automatically fixes common issues with different AI models:
- Removes escape characters for Gemini, Llama, Mistral models
- Strips markdown codeblock markers for DeepSeek, Llama, Mistral models
- Converts HTML entities for DeepSeek models
- Cleans up whitespace for models like "minsteral"/"minstral"
- Handles JSON and YAML file formatting
- Enhanced File Path Validation: Improved security with better pattern matching and validation
- Better Error Handling: More robust error messages and validation
Improved User Interface
- Action Approval Cards: New styled cards for file edits and command execution with gradient backgrounds
- File Read Notifications: Visual indicators when files are being read
- Enhanced Environment Info: Better workspace information display with colored chips
- Improved Markdown Rendering: Better table support and formatting
Advanced Context Management
- File Context Tracking: Tracks files that are read, edited, or mentioned
- Task Context Awareness: Maintains context about files created/modified during tasks
- Session Restoration: Improved session management and restoration
Code Quality Improvements
- TypeScript Best Practices: Better type definitions and error handling
- Code Organization: Cleaner separation of concerns
- Performance Optimizations: More efficient file operations and UI updates
What's New in v0.1.9
Bug Fixes
- Session Restore Now Works: Restoring a previous session now properly displays all chat messages in the chat window
- Default Model Settings: The
ollamaCloud.chatModel setting now properly sets the selected model in the dropdown
- Premium Models Indicator: Premium models (70B+, Mixtral, Claude, GPT-4, etc.) are clearly marked with a 💎 icon
What's New in v0.1.8
Action Approval UI (Cline-Style)
- "Ollama wants to edit {filename}": Beautiful purple/indigo gradient cards appear when AI suggests file edits
- "Ollama wants to run command": Amber/yellow gradient cards for command execution requests
- "Ollama is reading file": Blue notification cards when AI reads files
- Apply/Skip Buttons: User-friendly buttons to approve or skip each action
- Task Completion Banner: Green gradient banner shows summary when all actions complete
Model Dropdown Enhancements
- Source Indicators: 💻 for local models, ☁️ for cloud models
- Premium Model Indicator: 💎 badge for large/expensive models (70B+, Mixtral, Claude, GPT-4)
- Smart Sorting: Local models appear first in the dropdown
Custom System Prompt
- New Setting:
ollamaCloud.customSystemPrompt with dynamic placeholders
- Placeholders:
{{OS}}, {{SHELL}}, {{WORKSPACE}}, {{WORKSPACE_PATH}}, {{FILE_TREE}}, {{NPM_SCRIPTS}}, {{MODE}}
- Full Control: Replace the entire system prompt or leave empty for default
Session Restore Improvements
- No Auto-Restore: Sessions no longer automatically restore on startup
- User Choice: Prompt with "Restore Session" and "Start Fresh" buttons
Other Improvements
- Increased Timeout: Default timeout increased from 30s to 120s for large models
- No Placeholder Code: AI now provides complete, copy-paste ready code (no more "// ... rest of code")
What's New in v0.1.4
- Full Cross-Platform Support: Commands now work seamlessly on Windows, macOS, and Linux
- Platform-Aware Shell Selection: Automatically uses the appropriate shell (PowerShell on Windows, zsh on macOS, bash on Linux)
- Smart Command Chaining: Uses correct command separators for each platform (
; for PowerShell, && for Unix shells)
What's New in v0.1.3
Improved AI Response Reliability
- Official Ollama SDK: Now uses the official
ollama npm package for better compatibility and reliability
- Streaming Responses: See AI output in real-time as it's generated (configurable)
- Context Window Management: Proper
num_ctx parameter support (default 32768) for longer conversations
- Retry Logic: Automatic retry with exponential backoff for rate-limited requests
- Request Cancellation: Working Cancel button to abort long-running requests
- Actual Token Counts: Real token usage from API instead of estimates
New Configuration Options
enableStreaming - Toggle streaming responses on/off
contextWindow - Set the context window size (2048-131072)
requestTimeout - Set request timeout in milliseconds (5000-600000)
Installation
From VSIX (Local Installation)
- Download the
.vsix file
- Open VSCode
- Go to Extensions (Ctrl+Shift+X)
- Click the "..." menu at the top
- Select "Install from VSIX..."
- Choose the downloaded file
From Source
- Clone this repository
- Run
npm install
- Run
npm run compile
- Press F5 to open a new VSCode window with the extension loaded
Setup
Option 1: Ollama Cloud (Recommended for beginners)
Get Ollama Cloud API Key
- Sign up at ollama.com
- Go to your account settings and generate an API key
Configure the Extension
- Open VSCode Settings (Ctrl+,)
- Search for "Ollama Cloud"
- Enter your API key in
ollamaCloud.cloudApiKey
Option 2: Local Ollama
Install Ollama
- Download from ollama.com
- Run
ollama serve to start the local server
Pull a Model
ollama pull llama3.1
Configure the Extension
- The extension will automatically detect local models
- No API key needed for local models
Option 3: Both (Hybrid)
Use both local and cloud models! The extension automatically routes requests to the appropriate endpoint based on where each model is available.
Option 4: Grok (xAI)
- Get your API key from console.x.ai.
- Set
ollamaCloud.apiProvider to grok.
- Add your key to
ollamaCloud.grokApiKey.
- Optional: set
ollamaCloud.grokBaseUrl if you use a proxy/gateway.
Provider-Specific Setup (ChatGPT, Claude, Grok)
- Open VS Code settings and search
ollamaCloud.apiProvider.
- Choose one provider:
openai or chatgpt for OpenAI models
anthropic or claude for Claude models
grok for xAI Grok models
- Set the matching API key:
ollamaCloud.openAiApiKey
ollamaCloud.anthropicApiKey
ollamaCloud.grokApiKey
- Optional for enterprise/proxy setups:
ollamaCloud.openAiBaseUrl
ollamaCloud.anthropicBaseUrl
ollamaCloud.grokBaseUrl
- Set
ollamaCloud.providerFallbackOrder so requests can fail over if your primary provider is unavailable.
Production-Ready Setup Checklist
- Set
ollamaCloud.autoApprove to false unless you explicitly want unattended edits/commands.
- Keep
ollamaCloud.allowUnsafeCommands at false for default command safety guardrails.
- Set
ollamaCloud.requestTimeout high enough for your largest model (for example 120000 or 180000).
- Configure
ollamaCloud.providerFallbackOrder with at least one backup provider.
- Cap prompt spend with
ollamaCloud.requestTokenBudget and ollamaCloud.maxConversationMessages.
- For faster/cheaper suggestions, use a smaller
ollamaCloud.autocompleteModel.
- Leave
ollamaCloud.enableAdaptivePerformance enabled so debounce/cache settings can adapt over time.
Token Efficiency Recommendations
- Lower
ollamaCloud.requestTokenBudget if responses are too expensive.
- Lower
ollamaCloud.maxConversationMessages to keep only the most recent context.
- Disable
ollamaCloud.includeFileContext when not needed for simple questions.
- Use smaller models for autocomplete and larger models only for complex generation/review tasks.
Cost-Aware Routing
- Enable
ollamaCloud.enableCostAwareRouting.
- Set
ollamaCloud.routingPreference:
balanced for mixed cost/latency optimization
lowCost to prioritize cheaper requests
lowLatency to prioritize faster providers
- Optional: set
ollamaCloud.maxEstimatedRequestCostUsd as a soft per-request cap.
Usage
Opening the Chat
- Click the Ollama Cloud icon in the Activity Bar (left sidebar)
- Or use the keyboard shortcut:
Ctrl+Shift+O (Windows/Linux) or Cmd+Shift+O (Mac)
- Or open Command Palette (Ctrl+Shift+P) and run "Ollama Cloud: Open Chat"
Validate Your Setup
Run Ollama Cloud: Validate Provider Configuration from the Command Palette to quickly verify:
- primary provider API key availability
- fallback provider key coverage
- fallback order duplicate/alias cleanup
- configured chat model compatibility with selected provider
Chatting with the AI
Simply type your question or request in the chat input and press Enter. The AI can help with:
- Writing new code
- Explaining existing code
- Debugging errors
- Refactoring code
- Creating new files
- Running commands
- And much more!
Operating Modes
ACT Mode (Default)
- AI can suggest file edits and commands.
- You approve each suggested action before it runs (unless auto-approve is enabled).
- Best for getting things done
PLAN Mode
- AI explains what should be done without executing
- Great for understanding complex tasks
- Use for learning and planning
File Editing
When the AI suggests file changes, it will format them like this:
// File: src/example.js
function hello() {
console.log("Hello, World!");
}
Approval Workflow
By default, you'll be prompted to approve changes before they're applied. Suggestions appear as clickable cards in chat with options to Apply or Skip.
If you'd prefer to skip manual approval, enable Auto-Approve (ollamaCloud.autoApprove). Warning: with auto-approve enabled, AI can modify files and execute commands without per-action confirmation.
Command Execution
When the AI suggests commands, they'll be formatted like:
npm install axios
Approval Workflow
Command suggestions appear as clickable cards. By default, you explicitly approve each command before execution.
If you'd prefer to skip manual approval, enable Auto-Approve (ollamaCloud.autoApprove). Warning: commands run immediately in this mode.
Configuration
| Setting |
Description |
Default |
ollamaCloud.cloudApiKey |
Your Ollama Cloud API key |
(empty) |
ollamaCloud.anthropicApiKey |
Anthropic API key for Claude |
(empty) |
ollamaCloud.openAiApiKey |
OpenAI API key for ChatGPT/GPT models |
(empty) |
ollamaCloud.grokApiKey |
xAI Grok API key |
(empty) |
ollamaCloud.grokBaseUrl |
Grok API base URL |
https://api.x.ai/v1 |
ollamaCloud.localEndpoint |
Local Ollama server URL |
http://localhost:11434 |
ollamaCloud.apiProvider |
Primary provider for chat/completions |
ollama |
ollamaCloud.providerFallbackOrder |
Ordered fallback providers for transient failures |
["ollama","openai","anthropic","grok"] |
ollamaCloud.enableCostAwareRouting |
Rank providers/models using cost + latency telemetry |
false |
ollamaCloud.routingPreference |
Cost-aware routing strategy (balanced, lowCost, lowLatency) |
balanced |
ollamaCloud.maxEstimatedRequestCostUsd |
Soft per-request cost cap in USD (0 disables) |
0 |
ollamaCloud.chatModel |
AI model for chat |
llama3.1 |
ollamaCloud.autocompleteModel |
AI model for autocomplete |
ministral-3:3b |
ollamaCloud.defaultMode |
Default operating mode |
act |
ollamaCloud.customSystemPrompt |
Custom system prompt with placeholders |
(empty) |
ollamaCloud.temperature |
Response creativity (0-2) |
0.7 |
ollamaCloud.maxTokens |
Maximum response length |
4096 |
ollamaCloud.requestTokenBudget |
Approximate max prompt/context tokens per request |
16000 |
ollamaCloud.maxConversationMessages |
Max recent messages kept before compaction |
16 |
ollamaCloud.contextWindow |
Context window size |
32768 |
ollamaCloud.requestTimeout |
Request timeout (ms) |
120000 |
ollamaCloud.enableStreaming |
Enable streaming responses |
true |
ollamaCloud.autoApprove |
Auto-approve AI actions |
false |
ollamaCloud.allowUnsafeCommands |
Allow high-risk shell commands (not recommended) |
false |
ollamaCloud.showDiff |
Show diff before applying changes |
true |
ollamaCloud.enableAutocomplete |
Enable AI code completion |
true |
ollamaCloud.includeFileContext |
Automatically include context from open files in chat messages |
true |
ollamaCloud.enableCodeLens |
Show AI action buttons in code |
true |
ollamaCloud.enableStreamingForAutocomplete |
Enable streaming for autocomplete requests |
false |
ollamaCloud.autocompleteDebounceDelay |
Delay before sending autocomplete requests (ms) |
250 |
ollamaCloud.autocompletePreviewDelay |
Delay before showing autocomplete preview (ms) |
1000 |
ollamaCloud.enableAdaptivePerformance |
Enable adaptive performance tuning |
true |
Custom System Prompt Placeholders
When using customSystemPrompt, you can use these placeholders that will be replaced with actual values:
| Placeholder |
Description |
{{OS}} |
Operating system (Windows, macOS, Linux) |
{{SHELL}} |
Shell type (PowerShell, Bash, etc.) |
{{WORKSPACE}} |
Current workspace name |
{{WORKSPACE_PATH}} |
Full path to workspace |
{{FILE_TREE}} |
Project file structure |
{{NPM_SCRIPTS}} |
Available npm scripts |
{{MODE}} |
Current mode (ACT or PLAN) |
Project Context Files
The AI can automatically read project-specific context from special markdown files in your workspace. Create one of these files to provide the AI with project-specific information. The extension supports multiple popular AI context formats.
Supported File Names (Ordered by Priority)
.ollamacloud.md (Primary - Highest priority)
.ollamacloud-context.md
.github/copilot-instructions.md (GitHub standard location)
.github/copilot_instructions.md
.copilot-instructions.md
.copilot_instructions.md
copilot-instructions.md
.claude/context.md
.claude/instructions.md
claude-context.md
.project-context.md
.context.md
PROJECT.md
README.context.md
PROJECT_CONTEXT.md
This extension automatically reads context files used by other AI assistants:
- GitHub Copilot: Supports
.github/copilot-instructions.md format
- Anthropic Claude: Supports
.claude/context.md format
- Generic AI tools: Supports common context file names
You can use existing context files from other AI tools, or create a new one tailored for Ollama Cloud.
Example Project Context Files
Create .ollamacloud.md in your project root:
# Project Information
- **Project Name**: MyApp
- **Framework**: React with TypeScript
- **Build Tool**: Vite
- **CSS Framework**: Tailwind CSS
- **Backend API**: REST API at /api/v1
# Important Guidelines
- Always use functional components with React hooks
- Follow the existing folder structure pattern
- Use Tailwind classes for styling instead of CSS files
- Maintain consistent error handling with try/catch blocks
- Keep components small and focused (max 150 lines)
# Architecture Notes
- Authentication is handled via JWT tokens
- State management uses React Context + useReducer
- All API calls go through the `/src/api/client.ts` wrapper
- Environment variables are prefixed with `VITE_`
# Common Patterns to Avoid
- ❌ Don't use Redux (we migrated away from it)
- ❌ Don't create new CSS files (use Tailwind only)
- ❌ Don't use class components (legacy only)
# Copilot Instructions for This Project
## Coding Standards
- Follow the existing code style
- Use TypeScript with strict typing
- Include JSDoc comments for exported functions
- Handle errors gracefully with try/catch
## Project Structure
- `/src` - Source code
- `/test` - Unit tests
- `/dist` - Built output
- Configuration files in root
## Key Dependencies
- vscode - VS Code Extension API
- ollama - Ollama client library
- axios - HTTP client for API calls
Claude Format (.claude/context.md)
# Claude Context
This project is a VS Code extension for AI-assisted coding.
## Core Principles
- User privacy is paramount
- Transparency in all AI interactions
- Security through explicit approval workflows
- Performance through caching and debouncing
## Technical Constraints
- Must work with both local and cloud AI models
- All file operations require user confirmation
- Network calls must handle timeouts gracefully
- Extension must work offline when possible
Best Practices
- Single Context File: Use only one context file to avoid conflicts
- Keep it Concise: Focus on the most important information
- Regular Updates: Keep context files current with project changes
- Team Consensus: Ensure team agrees on context guidelines
- Security Review: Review context files for sensitive information
The AI will automatically include this context information in all conversations to provide more accurate and project-specific responses.
Available Models
Cloud Models (Ollama Cloud)
- llama3.1 - Latest Llama model (recommended)
- llama3.2 - Llama 3.2
- ministral-3:3b - Fast, efficient for autocomplete
- gemma3:4b - Google's Gemma 3
Local Models (requires local Ollama)
- codellama - Specialized for coding
- mistral - Fast and efficient
- mixtral - Mixture of experts model
- qwen2.5-coder - Specialized coding model
- phi3 - Microsoft's Phi-3
Keyboard Shortcuts
| Shortcut |
Action |
Ctrl+Shift+O |
Open Chat |
Ctrl+Shift+N |
New Task |
Ctrl+K |
Inline Chat (with selection) |
Ctrl+Shift+E |
Explain Code |
Ctrl+Shift+T |
Generate Tests |
Ctrl+Shift+F |
Fix Code |
Ctrl+Shift+R |
Review Code |
Ctrl+Shift+M |
Modernize Code |
Enter |
Send message |
Shift+Enter |
New line in message |
Commands
Chat & Core
- Ollama Cloud: Open Chat - Open the chat interface
- Ollama Cloud: New Task - Start a new conversation
- Ollama Cloud: Clear History - Clear chat history
- Ollama Cloud: Select Model - Choose a different AI model
Code Actions
- Ollama Cloud: Explain Code - Explain selected code
- Ollama Cloud: Fix Code - Fix issues in selected code
- Ollama Cloud: Improve Code - Suggest code improvements
- Ollama Cloud: Add to Chat - Add selected code to chat
- Ollama Cloud: Generate Tests - Generate tests for selected code
- Ollama Cloud: Review Code - Get a code review
- Ollama Cloud: Modernize Code - Update code to modern standards
Terminal Integration
- Ollama Cloud: Add Terminal Output - Add selected terminal output to chat
- Ollama Cloud: Explain Terminal Error - Explain terminal errors
- Ollama Cloud: Suggest Terminal Command - Get command suggestions
Web Research
- Ollama Cloud: Search Web - Search the internet and add results to chat
- Ollama Cloud: Research Topic - Deep research with content fetching
- Ollama Cloud: Fetch URL - Fetch and analyze URL content
Jupyter Notebooks
- Ollama Cloud: Explain Notebook Cell - Explain current notebook cell
- Ollama Cloud: Fix Notebook Cell - Fix errors in notebook cell
- Ollama Cloud: Optimize Notebook Cell - Optimize cell for performance
- Ollama Cloud: Generate Notebook Cell - Generate cell from description
Walkthrough & Help
- Ollama Cloud: Show Welcome - Show welcome tour
- Ollama Cloud: Show Tips - Show tips and tricks
- Ollama Cloud: Show Shortcuts - Show keyboard shortcuts
Development
- Ollama Cloud: Toggle Dev Mode - Enable/disable development mode
- Ollama Cloud: Show Dev Stats - Show development statistics
Tips
- Be Specific: The more specific your request, the better the AI can help
- Provide Context: Mention file names, error messages, or relevant code
- Review Changes: Always review AI-suggested changes before applying
- Use New Task: Start a new task for unrelated questions to maintain context
- Experiment with Models: Different models excel at different tasks
- Adjust Timeout: Increase
requestTimeout for larger models or slower connections
- Use Streaming: Keep streaming enabled to see responses as they generate
Troubleshooting
"Invalid API key" Error
- Verify your API key in settings
- Make sure you have an active Ollama Cloud account
"Request timed out" Error
- Increase
requestTimeout in settings (default 120000 ms)
- Try a smaller/faster model
- Check your internet connection
Extension Not Loading
- Check the Output panel (View → Output → Ollama Cloud)
- Try reloading VSCode (Ctrl+Shift+P → "Reload Window")
Slow Responses
- Try a smaller model (e.g., ministral-3:3b instead of llama3.1)
- Enable streaming to see partial responses
- Check your internet connection
- Reduce
maxTokens in settings
Local Ollama Not Detected
- Make sure Ollama is running (
ollama serve)
- Check the
localEndpoint setting matches your Ollama server
Tests Fail With code: bad option Or Node CLI Help
- Ensure
ELECTRON_RUN_AS_NODE is not exported in your shell when running extension tests.
- The repository test runner (
src/test/runTest.ts) now unsets it automatically before launching VS Code.
Privacy & Security
- Your code is sent to Ollama Cloud or your local Ollama for processing
- API keys are stored locally in VSCode settings
- No data is stored by this extension beyond session persistence
- Review Ollama's privacy policy for cloud usage details
Development
Building from Source
# Install dependencies
npm install
# Compile TypeScript
npm run compile
# Watch for changes
npm run watch
# Package extension
npm run package
Build, Test, Package, And Bump Version Automatically
Use the project release script:
# Patch release (default)
./build.sh
# Explicit bump type
./build.sh patch
./build.sh minor
./build.sh major
# Build/test/package without version bump
./build.sh --no-bump
What build.sh does:
- Installs dependencies when missing.
- Runs reliability checks (
npm run validate:reliability) to verify chat view wiring, required settings, and command contributions.
- Runs lint and compile.
- Optionally runs compile-tests + tests in strict mode (
./build.sh --strict).
- Bumps
package.json version (unless --no-bump).
- Builds extension assets and creates a
.vsix package named ollama-cloud-<version>.vsix.
Reliability Gates
# Validate manifest and extension wiring contracts
npm run validate:reliability
# Individual checks
npm run validate:manifest
npm run validate:contracts
The reliability checks fail fast if critical UX contracts regress, including:
- Missing or mismatched chat view contribution IDs
- Missing required settings (chat/autocomplete/provider)
- Wrong default provider (
ollamaCloud.apiProvider must default to ollama)
- Missing core commands (open chat, autocomplete model selection, configuration validation)
Running Tests
npm test
Recommended Developer Workflow
- Run
npm run validate:reliability before each commit.
- Use
./build.sh --no-bump for local packaging checks.
- Use
./build.sh --strict --no-bump when validating test compilation/execution.
- Use
./build.sh patch|minor|major only when preparing a release artifact.
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
License
MIT License - See LICENSE file for details
Credits
Inspired by the Cline VSCode extension. Built with ❤️ for the developer community.
Support
Note: This extension works with both Ollama Cloud (requires API key) and local Ollama (free, requires local installation). Visit ollama.com to get started.
| |