Skip to content
| Marketplace
Sign in
Visual Studio Code>Programming Languages>Ollama Cloud BETANew to Visual Studio Code? Get it now.
Ollama Cloud BETA

Ollama Cloud BETA

JKagiDesigns LLC

|
32 installs
| (0) | Free
AI-powered coding assistant using Ollama Cloud - similar to Cline, Copilot, etc. BETA
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Ollama Cloud VSCode Extension

An AI-powered coding assistant for Visual Studio Code, powered by Ollama Cloud and local Ollama. Similar to Cline, this extension provides intelligent code assistance, file editing, and command execution capabilities.

Features

  • 🤖 AI Chat Interface: Interactive chat with Ollama Cloud and local Ollama models
  • 📝 Smart Code Editing: AI can read, write, and modify files with your approval
  • 🔧 Command Execution: Execute terminal commands suggested by the AI
  • 🔄 Diff View: Review changes before applying them
  • 🎯 Multiple Models: Support for various Ollama models (Llama, CodeLlama, Mistral, etc.)
  • ⚡ Real-time Streaming: See AI responses as they're generated
  • 🎨 VSCode Integration: Native VSCode UI with dark/light theme support
  • 🧠 Enhanced Context Awareness: Tracks tasks, files, and commands for better continuity
  • 💾 Session Persistence: Auto-saves and restores conversations across VSCode restarts
  • 📊 Token Usage Tracking: Monitor your Ollama Cloud API usage with visual indicators
  • 🗂️ Workspace Indexing: Automatically understands your project structure
  • 🔁 Automatic Retry: Smart retry logic with exponential backoff for rate limits
  • ⏱️ Configurable Timeouts: Adjust request timeouts for slower connections

What's New in v0.1.11

🔍 Web Research Capabilities

  • Search the Web: AI can now search the internet using DuckDuckGo (no API key required)
  • Deep Research: Automatically fetches and analyzes content from top search results
  • URL Fetching: Extract and analyze text from any webpage
  • File Downloads: Download files from URLs for analysis
  • Commands: searchWeb, researchTopic, fetchUrl

👋 Interactive Walkthrough & Onboarding

  • Welcome Tour: Step-by-step introduction to all features on first use
  • Tips & Tricks: Comprehensive tips panel with categorized advice
  • Keyboard Shortcuts: Quick reference guide for all shortcuts
  • Feature Discovery: Interactive tour of chat, code actions, autocomplete, terminal, research, and notebooks

🔧 Development Mode

  • Hot Reload: Automatic file watching with reload prompts
  • Dev Panel: Quick actions for reload, clear cache, and logs
  • Output Channel: Detailed development logs
  • Statistics: Track active watchers and extension state

💾 State Manager with Debounced Persistence

  • Fast In-Memory Reads: Instant access to state data
  • Debounced Writes: 500ms batched writes for performance
  • Batch Operations: Efficient bulk state updates
  • Secrets Support: Secure credential storage
  • Statistics Tracking: Monitor cache sizes and pending changes

📓 Jupyter Notebook Support

  • Cell Operations: Explain, fix, optimize, and generate notebook cells
  • Output Integration: Analyzes cell outputs for better context
  • Full Context: Extracts entire notebook structure
  • Commands: explainNotebookCell, fixNotebookCell, optimizeNotebookCell, generateNotebookCell

🔗 URI Handler for Deep Linking

  • Deep Links: Open extension features via URLs
  • Supported URIs:
    • vscode://ollama-cloud/chat?message=Hello&autoSend=true
    • vscode://ollama-cloud/explain?file=/path/to/file.ts&line=10
    • vscode://ollama-cloud/fix?file=/path/to/file.ts&line=10
    • vscode://ollama-cloud/generate?type=test&file=/path/to/file.ts
    • vscode://ollama-cloud/model?select=chat

📋 Enhanced Terminal Integration

  • Clipboard-Based Capture: Preserves your clipboard while capturing terminal output
  • Error Analysis: Explain terminal errors with full context
  • Command Suggestions: AI suggests commands based on your description
  • OS-Aware: Adapts to Windows, macOS, and Linux

🎯 Code Action Provider

  • Right-Click Menu: Access AI features directly from code
  • Actions: Add to Chat, Explain Code, Improve Code, Fix with Ollama
  • Smart Context: Auto-expands 3 lines above/below for better understanding
  • Diagnostic Integration: Fixes errors based on VS Code diagnostics

⚡ Enhanced Autocomplete

  • LRU Cache: Smart caching with max 100 entries
  • Performance Tracking: Monitor cache hit rates and request counts
  • Automatic Cleanup: Periodic maintenance every 60 seconds
  • Optimized Debounce: 250ms for faster responses
  • Statistics: getStats() method for debugging

What's New in v0.1.10

Enhanced Token Usage Display

  • Top-Position Token Usage: Token usage now appears prominently at the top of the chat interface
  • Model-Specific Styling: Local models show green styling, cloud models show blue styling with gradient backgrounds
  • Visual Progress Bars: Real-time token usage tracking with color-coded progress bars (green/yellow/red)
  • Monthly Usage Tracking: Monitor your Ollama Cloud API usage with detailed breakdowns

Advanced File Editing Features

  • Model-Specific Content Fixes: Automatically fixes common issues with different AI models:
    • Removes escape characters for Gemini, Llama, Mistral models
    • Strips markdown codeblock markers for DeepSeek, Llama, Mistral models
    • Converts HTML entities for DeepSeek models
    • Cleans up whitespace for models like "minsteral"/"minstral"
    • Handles JSON and YAML file formatting
  • Enhanced File Path Validation: Improved security with better pattern matching and validation
  • Better Error Handling: More robust error messages and validation

Improved User Interface

  • Action Approval Cards: New styled cards for file edits and command execution with gradient backgrounds
  • File Read Notifications: Visual indicators when files are being read
  • Enhanced Environment Info: Better workspace information display with colored chips
  • Improved Markdown Rendering: Better table support and formatting

Advanced Context Management

  • File Context Tracking: Tracks files that are read, edited, or mentioned
  • Task Context Awareness: Maintains context about files created/modified during tasks
  • Session Restoration: Improved session management and restoration

Code Quality Improvements

  • TypeScript Best Practices: Better type definitions and error handling
  • Code Organization: Cleaner separation of concerns
  • Performance Optimizations: More efficient file operations and UI updates

What's New in v0.1.9

Bug Fixes

  • Session Restore Now Works: Restoring a previous session now properly displays all chat messages in the chat window
  • Default Model Settings: The ollamaCloud.chatModel setting now properly sets the selected model in the dropdown
  • Premium Models Indicator: Premium models (70B+, Mixtral, Claude, GPT-4, etc.) are clearly marked with a 💎 icon

What's New in v0.1.8

Action Approval UI (Cline-Style)

  • "Ollama wants to edit {filename}": Beautiful purple/indigo gradient cards appear when AI suggests file edits
  • "Ollama wants to run command": Amber/yellow gradient cards for command execution requests
  • "Ollama is reading file": Blue notification cards when AI reads files
  • Apply/Skip Buttons: User-friendly buttons to approve or skip each action
  • Task Completion Banner: Green gradient banner shows summary when all actions complete

Model Dropdown Enhancements

  • Source Indicators: 💻 for local models, ☁️ for cloud models
  • Premium Model Indicator: 💎 badge for large/expensive models (70B+, Mixtral, Claude, GPT-4)
  • Smart Sorting: Local models appear first in the dropdown

Custom System Prompt

  • New Setting: ollamaCloud.customSystemPrompt with dynamic placeholders
  • Placeholders: {{OS}}, {{SHELL}}, {{WORKSPACE}}, {{WORKSPACE_PATH}}, {{FILE_TREE}}, {{NPM_SCRIPTS}}, {{MODE}}
  • Full Control: Replace the entire system prompt or leave empty for default

Session Restore Improvements

  • No Auto-Restore: Sessions no longer automatically restore on startup
  • User Choice: Prompt with "Restore Session" and "Start Fresh" buttons

Other Improvements

  • Increased Timeout: Default timeout increased from 30s to 120s for large models
  • No Placeholder Code: AI now provides complete, copy-paste ready code (no more "// ... rest of code")

What's New in v0.1.4

Cross-Platform Command Execution

  • Full Cross-Platform Support: Commands now work seamlessly on Windows, macOS, and Linux
  • Platform-Aware Shell Selection: Automatically uses the appropriate shell (PowerShell on Windows, zsh on macOS, bash on Linux)
  • Smart Command Chaining: Uses correct command separators for each platform (; for PowerShell, && for Unix shells)

What's New in v0.1.3

Improved AI Response Reliability

  • Official Ollama SDK: Now uses the official ollama npm package for better compatibility and reliability
  • Streaming Responses: See AI output in real-time as it's generated (configurable)
  • Context Window Management: Proper num_ctx parameter support (default 32768) for longer conversations
  • Retry Logic: Automatic retry with exponential backoff for rate-limited requests
  • Request Cancellation: Working Cancel button to abort long-running requests
  • Actual Token Counts: Real token usage from API instead of estimates

New Configuration Options

  • enableStreaming - Toggle streaming responses on/off
  • contextWindow - Set the context window size (2048-131072)
  • requestTimeout - Set request timeout in milliseconds (5000-600000)

Installation

From VSIX (Local Installation)

  1. Download the .vsix file
  2. Open VSCode
  3. Go to Extensions (Ctrl+Shift+X)
  4. Click the "..." menu at the top
  5. Select "Install from VSIX..."
  6. Choose the downloaded file

From Source

  1. Clone this repository
  2. Run npm install
  3. Run npm run compile
  4. Press F5 to open a new VSCode window with the extension loaded

Setup

Option 1: Ollama Cloud (Recommended for beginners)

  1. Get Ollama Cloud API Key

    • Sign up at ollama.com
    • Go to your account settings and generate an API key
  2. Configure the Extension

    • Open VSCode Settings (Ctrl+,)
    • Search for "Ollama Cloud"
    • Enter your API key in ollamaCloud.cloudApiKey

Option 2: Local Ollama

  1. Install Ollama

    • Download from ollama.com
    • Run ollama serve to start the local server
  2. Pull a Model

    ollama pull llama3.1
    
  3. Configure the Extension

    • The extension will automatically detect local models
    • No API key needed for local models

Option 3: Both (Hybrid)

Use both local and cloud models! The extension automatically routes requests to the appropriate endpoint based on where each model is available.

Usage

Opening the Chat

  • Click the Ollama Cloud icon in the Activity Bar (left sidebar)
  • Or use the keyboard shortcut: Ctrl+Shift+O (Windows/Linux) or Cmd+Shift+O (Mac)
  • Or open Command Palette (Ctrl+Shift+P) and run "Ollama Cloud: Open Chat"

Chatting with the AI

Simply type your question or request in the chat input and press Enter. The AI can help with:

  • Writing new code
  • Explaining existing code
  • Debugging errors
  • Refactoring code
  • Creating new files
  • Running commands
  • And much more!

Operating Modes

ACT Mode (Default)

  • AI can suggest file edits and commands
  • You approve each action before it's executed
  • Best for getting things done

PLAN Mode

  • AI explains what should be done without executing
  • Great for understanding complex tasks
  • Use for learning and planning

File Editing

When the AI suggests file changes, it will format them like this:

// File: src/example.js
function hello() {
  console.log("Hello, World!");
}

You'll be prompted to approve the changes before they're applied. If "Show Diff" is enabled in settings, you'll see a side-by-side comparison.

Command Execution

When the AI suggests commands, they'll be formatted like:

npm install axios

You'll be asked to confirm before the command is executed.

Configuration

Setting Description Default
ollamaCloud.cloudApiKey Your Ollama Cloud API key (empty)
ollamaCloud.localEndpoint Local Ollama server URL http://localhost:11434
ollamaCloud.chatModel AI model for chat llama3.1
ollamaCloud.autocompleteModel AI model for autocomplete ministral-3:3b
ollamaCloud.defaultMode Default operating mode act
ollamaCloud.customSystemPrompt Custom system prompt with placeholders (empty)
ollamaCloud.temperature Response creativity (0-2) 0.7
ollamaCloud.maxTokens Maximum response length 4096
ollamaCloud.contextWindow Context window size 32768
ollamaCloud.requestTimeout Request timeout (ms) 120000
ollamaCloud.enableStreaming Enable streaming responses true
ollamaCloud.autoApprove Auto-approve AI actions false
ollamaCloud.showDiff Show diff before applying changes true
ollamaCloud.enableAutocomplete Enable AI code completion true
ollamaCloud.enableCodeLens Show AI action buttons in code true

Custom System Prompt Placeholders

When using customSystemPrompt, you can use these placeholders that will be replaced with actual values:

Placeholder Description
{{OS}} Operating system (Windows, macOS, Linux)
{{SHELL}} Shell type (PowerShell, Bash, etc.)
{{WORKSPACE}} Current workspace name
{{WORKSPACE_PATH}} Full path to workspace
{{FILE_TREE}} Project file structure
{{NPM_SCRIPTS}} Available npm scripts
{{MODE}} Current mode (ACT or PLAN)

Available Models

Cloud Models (Ollama Cloud)

  • llama3.1 - Latest Llama model (recommended)
  • llama3.2 - Llama 3.2
  • ministral-3:3b - Fast, efficient for autocomplete
  • gemma3:4b - Google's Gemma 3

Local Models (requires local Ollama)

  • codellama - Specialized for coding
  • mistral - Fast and efficient
  • mixtral - Mixture of experts model
  • qwen2.5-coder - Specialized coding model
  • phi3 - Microsoft's Phi-3

Keyboard Shortcuts

Shortcut Action
Ctrl+Shift+O Open Chat
Ctrl+Shift+N New Task
Ctrl+K Inline Chat (with selection)
Ctrl+Shift+E Explain Code
Ctrl+Shift+T Generate Tests
Ctrl+Shift+F Fix Code
Ctrl+Shift+R Review Code
Ctrl+Shift+M Modernize Code
Enter Send message
Shift+Enter New line in message

Commands

Chat & Core

  • Ollama Cloud: Open Chat - Open the chat interface
  • Ollama Cloud: New Task - Start a new conversation
  • Ollama Cloud: Clear History - Clear chat history
  • Ollama Cloud: Select Model - Choose a different AI model

Code Actions

  • Ollama Cloud: Explain Code - Explain selected code
  • Ollama Cloud: Fix Code - Fix issues in selected code
  • Ollama Cloud: Improve Code - Suggest code improvements
  • Ollama Cloud: Add to Chat - Add selected code to chat
  • Ollama Cloud: Generate Tests - Generate tests for selected code
  • Ollama Cloud: Review Code - Get a code review
  • Ollama Cloud: Modernize Code - Update code to modern standards

Terminal Integration

  • Ollama Cloud: Add Terminal Output - Add selected terminal output to chat
  • Ollama Cloud: Explain Terminal Error - Explain terminal errors
  • Ollama Cloud: Suggest Terminal Command - Get command suggestions

Web Research

  • Ollama Cloud: Search Web - Search the internet and add results to chat
  • Ollama Cloud: Research Topic - Deep research with content fetching
  • Ollama Cloud: Fetch URL - Fetch and analyze URL content

Jupyter Notebooks

  • Ollama Cloud: Explain Notebook Cell - Explain current notebook cell
  • Ollama Cloud: Fix Notebook Cell - Fix errors in notebook cell
  • Ollama Cloud: Optimize Notebook Cell - Optimize cell for performance
  • Ollama Cloud: Generate Notebook Cell - Generate cell from description

Walkthrough & Help

  • Ollama Cloud: Show Welcome - Show welcome tour
  • Ollama Cloud: Show Tips - Show tips and tricks
  • Ollama Cloud: Show Shortcuts - Show keyboard shortcuts

Development

  • Ollama Cloud: Toggle Dev Mode - Enable/disable development mode
  • Ollama Cloud: Show Dev Stats - Show development statistics

Tips

  1. Be Specific: The more specific your request, the better the AI can help
  2. Provide Context: Mention file names, error messages, or relevant code
  3. Review Changes: Always review AI-suggested changes before applying
  4. Use New Task: Start a new task for unrelated questions to maintain context
  5. Experiment with Models: Different models excel at different tasks
  6. Adjust Timeout: Increase requestTimeout for larger models or slower connections
  7. Use Streaming: Keep streaming enabled to see responses as they generate

Troubleshooting

"Invalid API key" Error

  • Verify your API key in settings
  • Make sure you have an active Ollama Cloud account

"Request timed out" Error

  • Increase requestTimeout in settings (default 30000ms)
  • Try a smaller/faster model
  • Check your internet connection

Extension Not Loading

  • Check the Output panel (View → Output → Ollama Cloud)
  • Try reloading VSCode (Ctrl+Shift+P → "Reload Window")

Slow Responses

  • Try a smaller model (e.g., ministral-3:3b instead of llama3.1)
  • Enable streaming to see partial responses
  • Check your internet connection
  • Reduce maxTokens in settings

Local Ollama Not Detected

  • Make sure Ollama is running (ollama serve)
  • Check the localEndpoint setting matches your Ollama server

Privacy & Security

  • Your code is sent to Ollama Cloud or your local Ollama for processing
  • API keys are stored locally in VSCode settings
  • No data is stored by this extension beyond session persistence
  • Review Ollama's privacy policy for cloud usage details

Development

Building from Source

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Watch for changes
npm run watch

# Package extension
npm run package

Running Tests

npm test

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - See LICENSE file for details

Credits

Inspired by the Cline VSCode extension. Built with ❤️ for the developer community.

Support

  • Report issues on GitLab
  • Visit jkagidesigns.com for more projects

Note: This extension works with both Ollama Cloud (requires API key) and local Ollama (free, requires local installation). Visit ollama.com to get started.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft