Ollama Agentz

An AI coding plugin for VS Code powered by Ollama running locally. This project provides an intelligent assistant that can read, write, and modify files in your workspace, execute shell commands, fetch web content, and integrate with MCP (Model Context Protocol) servers for extended functionality.

What's New

Improved streaming detection and automatic fallback to buffered responses when servers do not stream as expected.
MCP server tooling enhancements: optional /api/ps handling, Playwright MCP example, and better error diagnostics.
The agent now receives live MCP server/tool inventory in its system prompt and can add, update, remove, and reload MCP servers during a session.
Built-in MCP scaffolding workflow: generate a local MCP server starter, optionally install dependencies, register it, and connect it from the extension.
Debugging and logging improvements: output forwarded to Debug Console and key settings logged at startup.

Compatibility

Endpoints supported: Ollama-native (/api/tags, /api/show, /api/chat) and OpenAI-compatible (/v1/models, /v1/chat/completions).
Discovery order: The extension attempts Ollama-native discovery first, then falls back to OpenAI-compatible discovery.
Manual override: If discovery fails, set ollamaAgent.endpoint and ollamaAgent.model in settings to point at your server and model.
Streaming behavior: Streaming is detected automatically; when a server does not stream as expected the extension falls back to buffered responses and caches that behavior for the current session.
Examples: See the existing "Native Ollama-compatible server example" and "OpenAI-compatible local server example" sections below for configuration snippets.

Features

🤖 AI-Powered Coding Assistant: Leverages local LLMs via Ollama for code generation, refactoring, and analysis
💬 Interactive Chat Interface: Sidebar chat panel with conversation history and context management
🛠️ File Operations: Read, create, edit, and delete files in your workspace
⚡ Command Execution: Run shell commands directly from the agent (PowerShell on Windows, bash/zsh on macOS/Linux)
🌐 Web Fetching: Retrieve and analyze web content
🔌 MCP Server Support: Extend capabilities with Model Context Protocol servers
🔄 Streaming Responses: Real-time response streaming with stop capability
📊 Context Window Tracking: Monitor token usage and remaining context
🧭 MCP Server Management & Debugging: Add and manage MCP servers (including Playwright), attempt connections on add, and inspect MCP tools
🧠 MCP-Aware Agent Planning: The LLM is told which MCP servers and tools are currently available so it can use them for browsing, automation, and external integrations
🏗️ Local MCP Server Scaffolding: Create a ready-to-edit Node-based MCP server starter directly in your workspace
🛟 Improved Logging & Debug Forwarding: OutputChannel messages are forwarded to the Debug Console and configuration values are logged at startup for visibility
🧰 Enhanced Chat UX: Resizable message area and last-prompt recall for faster iterative prompts

Requirements

VS Code 1.85.0 or higher
Ollama installed and running locally
At least one model pulled in Ollama (e.g., qwen2.5-coder:32b)

Installation

Clone this repository:

git clone <repository-url>
cd ollama-agent

Install dependencies:
```
npm install
```
Compile the extension:
```
npm run compile
```
Press F5 to open a new Extension Development Host window

Configuration

Configure the extension through VS Code settings (Ctrl+, or Cmd+,):

Setting	Default	Description
`ollamaAgent.endpoint`	`http://localhost:11434/api/chat`	Chat endpoint (native Ollama or OpenAI-compatible)
`ollamaAgent.model`	`""`	Optional model override when discovery is unavailable
`ollamaAgent.chatFormat`	`"auto"`	Optional Ollama `/api/chat` format hint (`peg-native` for custom servers)
`ollamaAgent.temperature`	`0.2`	Sampling temperature (0-2)
`ollamaAgent.useStreaming`	`true`	Use streaming chat responses when the server supports them
`ollamaAgent.streamingStallTimeoutMs`	`60000`	Cancel stalled streaming requests after this many ms and retry non-streaming
`ollamaAgent.maxIterations`	`8`	Maximum number of think-act iterations the agent can perform per request
`ollamaAgent.mcpServers`	`[]`	MCP servers configuration

The extension first tries Ollama-native model discovery via /api/tags, then falls back to OpenAI-compatible discovery via /v1/models. If your server only exposes chat completions, set ollamaAgent.model manually.

If your server does not implement /api/ps, that is now treated as optional. The extension will use /api/show metadata to determine context size and continue normally.

If your server accepts /api/chat but returns a normal application/json body even when stream is set to true, the extension now detects that and reads the response as a buffered reply automatically.

If your server accepts /api/chat but does not finish streamed responses until the client disconnects, set ollamaAgent.useStreaming to false. The extension also retries non-streaming automatically when a stream produces no activity for ollamaAgent.streamingStallTimeoutMs.

When the extension detects that a specific endpoint is not actually streaming, it now caches that result for the current VS Code session and stops sending stream: true to that endpoint on later requests.

Native Ollama-compatible server example

For a server like yours that exposes /api/tags, /api/show, and /api/chat but not /v1 routes, configure:

{
	"ollamaAgent.endpoint": "http://localhost:8080/api/chat",
	"ollamaAgent.chatFormat": "peg-native"
}

If your server uses a custom chat-format selector and logs Chat format: peg-native, set ollamaAgent.chatFormat to peg-native so the extension sends that format hint explicitly on /api/chat requests.

OpenAI-compatible local server example

For a server like your turbo-server running on port 8080, configure:

{
	"ollamaAgent.endpoint": "http://localhost:8080/v1/chat/completions",
	"ollamaAgent.model": "Qwen3 Coder 30B A3B Instruct"
}

If /v1/models returns a different model ID, use that exact ID instead of the display name above.

Example MCP Server Configuration

{
	"ollamaAgent.mcpServers": [
		{
			"name": "filesystem",
			"command": "npx",
			"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/allowed/dir"],
			"enabled": true
		}
	]
}

Playwright MCP server example

You can run a Playwright-based MCP server to enable browser automation tools. Example configuration (uses the hypothetical @modelcontextprotocol/server-playwright package):

{
	"ollamaAgent.mcpServers": [
		{
			"name": "playwright",
			"command": "npx",
			"args": ["-y", "@modelcontextprotocol/server-playwright"],
			"enabled": true,
			"timeoutMs": 120000
		}
	]
}

Notes:

timeoutMs increases the MCP request timeout for long-running browser actions.
The extension writes binary or file outputs from MCP tools to .ollama-agent/mcp_outputs in the workspace.

MCP Server Management & Debugging

Add MCP server: The extension includes a command to add a Playwright (or other) MCP server configuration and will attempt to connect after adding it. If the connection fails the extension surfaces diagnostic information in the Output and Debug Consoles.
Debug MCP tools: Use the Ollama Agent: Debug MCP Tools command to fetch registered tool details, endpoints, and capabilities from a connected MCP server.
UI improvements: The MCP server management view now uses collapsible sections with improved styling for easier navigation and state visibility.

Logging & Debug Forwarding

Debug Console forwarding: Messages written to the extension OutputChannel are forwarded to the Debug Console to make interactive debugging and breakpoints easier to correlate with agent output.
Startup configuration logging: Key configuration values (endpoint, enabled MCP servers, timeouts) are logged at startup to help with troubleshooting and reproducibility.

Chat Improvements

Resizable message area: The chat message input supports resizing so you can compose longer prompts comfortably.
Last-prompt recall: The chat will remember and offer the last prompt for quick re-use or iteration when composing follow-ups.

Usage

Opening the Chat

Click the Ollama icon in the Activity Bar
Use the status bar button: $(comment-discussion) Ollama Chat
Run the command: Ollama Agent: Chat

Running a Task

Open the chat panel
Select a model discovered from Ollama in the dropdown
Type your request (e.g., "Create a React component for a todo list")
The agent will:
- Analyze your request
- Read existing files if needed
- Create or modify files
- Execute commands (build, test, etc.)
- Provide a summary of actions taken

Available Commands

Command	Description
`Ollama Agent: Run`	Execute a one-off task via input box
`Ollama Agent: Chat`	Open the sidebar chat panel
`Ollama Agent: Scaffold MCP Server`	Generate a local MCP server starter and optionally connect it

Agent Capabilities

The agent can perform the following actions:

Action	Description
`read_file`	Read any file in the workspace
`create_file`	Create new files
`edit_file`	Modify existing files (full overwrite)
`delete_file`	Delete files (moves to trash)
`run_command`	Execute shell commands in workspace root
`fetch_url`	Fetch web content as readable text
`list_mcp_servers`	Inspect configured MCP servers and connection state
`list_mcp_tools`	Inspect available MCP tools and argument schemas
`scaffold_mcp_server`	Create a local MCP server starter and optionally install/register/connect it
`upsert_mcp_server`	Add or update an MCP server configuration and reconnect
`remove_mcp_server`	Remove an MCP server configuration
`reload_mcp_servers`	Reload MCP settings and reconnect configured servers
`mcp_tool`	Invoke tools from connected MCP servers

This lets the agent do two MCP-specific workflows that were previously unreliable:

Use connected browser-capable MCP servers for navigation and online tasks instead of assuming browsing is unavailable.
Create a new MCP server in the workspace, install/build it, register it in settings, and connect it without leaving the chat.

Scaffold A Local MCP Server

Use the command Ollama Agent: Scaffold MCP Server to generate a local MCP server starter under mcp-servers/<name>.

The workflow can:

scaffold a basic template with echo and get_time tools,
scaffold a web template with fetch_url and search_web tools,
run npm install in the generated folder,
add the generated server to ollamaAgent.mcpServers, and
attempt to connect it immediately.

Supported Models

The extension works with any Ollama model that supports chat completions. Recommended models:

qwen2.5-coder:32b (default) - Excellent for coding tasks
codellama:34b - Good for code generation
deepseek-coder:33b - Strong coding capabilities
llama3.1:70b - General purpose with good reasoning

Development

Project Structure

ollama-agent/
├── src/
│   ├── extension.ts          # Main extension entry point
│   ├── agent/
│   │   ├── agent.ts          # Agent orchestration
│   │   ├── executor.ts       # Tool execution logic
│   │   ├── llm.ts            # LLM provider (Ollama)
│   │   ├── mcp.ts            # MCP server management
│   │   └── tools.ts          # Tool definitions and prompts
│   ├── utils/
│   │   └── workspace.ts      # Workspace utilities
│   └── test/
│       └── mockProvider.ts   # Test utilities
├── package.json              # Extension manifest
└── tsconfig.json             # TypeScript configuration

Building

# Compile TypeScript
npm run compile

# Watch for changes
npm run watch

# Package for distribution
npm run package

Debugging

Open the project in VS Code
Set breakpoints in the source code
Press F5 to launch the Extension Development Host
Use the extension in the new window
Debugging output appears in the Debug Console

Platform Support

The extension automatically adapts to your operating system:

Platform	Shell	Notes
Windows	PowerShell	Uses PowerShell cmdlets
macOS	zsh/bash	Standard Unix commands
Linux	bash	GNU coreutils

Troubleshooting

Ollama Connection Issues

Ensure Ollama is running: ollama serve
Verify the endpoint URL in settings
Check that the model is downloaded: ollama list

OpenAI-Compatible Server Issues

Verify the server responds on /v1/chat/completions
If the model dropdown is empty, set ollamaAgent.model manually
If available, check whether /v1/models returns the model ID the server expects

Model Not Responding

Check the Output panel (View → Output → Ollama Agent)
Verify the model appears in the Ollama model list
Try a different model if the current one hangs

MCP Server Errors

Check the server command is installed and in PATH
Verify the arguments are correct
Check the Output panel for error messages

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

Ollama for making local LLMs accessible
Model Context Protocol for extending AI capabilities

Ollama agentz

ollama-agentz