Cloud LM Provider — AWS Bedrock & Azure OpenAIfor GitHub Copilot

IntelliDev Tools

103 installs

🚀 The #1 extension for enterprise AI in VS Code! Bring Claude 4.5, GPT-4o, Nova, DeepSeek & 50+ models into Copilot Chat. Features Headroom AI compression that saves 30-45% on API costs. Secure, fast, with full streaming, tool calling & vision support.

Installation

Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.

Copied to clipboard

More Info

Cloud LM Provider Logo

Cloud LM Provider

🚀 The [#1](https://github.com/suddhu-iith2004/cloud-lm-provider/issues/1) Extension for AWS Bedrock & Azure OpenAI in VS Code

Bring Claude 4.5, GPT-4o, Nova, DeepSeek, Llama, and 50+ enterprise AI models directly into GitHub Copilot Chat — with intelligent token compression that saves you up to 40% on API costs.

Quick Start • Features • Headroom AI • Models • Configuration • FAQ

🎯 Why Cloud LM Provider?

Challenge	Solution
🔒 Enterprise Compliance	Use your own AWS/Azure credentials — data never leaves your cloud
💰 Expensive API Costs	Headroom AI compresses context by 30-45%, saving thousands monthly
🐌 Slow Model Switching	Instant access to 50+ models in one dropdown
🔧 Complex Setup	One-click configuration wizard with auto-discovery
📊 No Cost Visibility	Real-time token tracking & savings dashboard

⚡ Quick Start

Installation

Install from VS Code Marketplace
```
ext install suddhu-iith2004.cloud-lm-provider
```
Or search "Cloud LM Provider" in the Extensions sidebar.

Run the Configuration Wizard

Ctrl+Shift+P → "Cloud LM: Manage Provider Configuration"

Choose Your Provider
- AWS Bedrock: Enter credentials or use AWS CLI profile
- Azure OpenAI: Enter endpoint URL and API key
Start Chatting
- Open GitHub Copilot Chat (Ctrl+Alt+I)
- Select your preferred model from the dropdown
- Experience enterprise AI in your IDE!

Cloud LM Provider Demo

✨ Features

🌐 Multi-Cloud AI Access

Access 50+ enterprise AI models from a single extension:

AWS Bedrock Models

Anthropic Claude — 4.5 Opus, 4.5 Sonnet, 3.7, 3.5, Haiku
Amazon Nova — Premier, Pro, Lite, Micro, Sonic
Meta Llama — 3.3 70B, 3.2, 3.1 variants
Mistral AI — Large, Small, 7B
Cohere — Command R, Command R+
DeepSeek — R1 Reasoning Model
AI21 Jamba — 1.5 Large, Mini

Azure OpenAI Models

GPT-4o — Latest multimodal flagship
GPT-4 Turbo — 128K context window
GPT-4 — Original reasoning model
GPT-3.5 Turbo — Fast & cost-effective
o1 & o1-mini — Advanced reasoning
Custom fine-tuned deployments

🎉 What's New in v1.2.2

Major Headroom AI Enhancements with adaptive context-aware compression and improved observability:

🧠 Adaptive Tool Compression — Dynamic 4-tier compression (Conservative, Balanced, Aggressive, Critical) automatically adjusts based on token budget pressure
🖼️ Intelligent Image Compression — Multimodal models now benefit from automatic image format optimization and quality scaling
📋 Enhanced Logging — Detailed breakdown of compression metrics, per-content-type savings, and transform tracking with structured [Headroom] format
⚙️ Smart Context Caching — Frequently-used context patterns cached and referenced to avoid re-transmission in long conversations
💾 Memory Store Optimization — Improved context deduplication across conversation history for better efficiency
🔄 Better Pricing — Fixed cost calculation accuracy with proper compression-induced savings tracking

🧠 Headroom AI — Intelligent Token Compression

Save 25-40% on every API call with our multi-strategy compression engine that never corrupts source code the AI needs to understand:

┌─────────────────────────────────────────────────────────────┐
│                    BEFORE HEADROOM                         │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  21,000 tokens    │
│  (Messages + System Prompts + Tool Schemas)                │
├─────────────────────────────────────────────────────────────┤
│                    AFTER HEADROOM                          │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━  14,700 tokens (-30%)          │
│  (Safe-optimised code + deduplicated + compressed)         │
└─────────────────────────────────────────────────────────────┘

How It Works

Safe Source Code Optimisation — Strips comments and normalises whitespace but never modifies logic, variable names, or strings. Preserves TODO/FIXME/HACK and JSDoc @param/@returns. Expected savings: 15-25%.
Large File Summarisation — Files exceeding your configured threshold (default 500 lines) are replaced by a structural summary (exports, classes, functions, interfaces, types) plus the first 100 and last 50 lines verbatim. Request specific sections if you need them. Expected savings: 60-80%.
Incremental Context Deduplication — When Copilot re-reads the same file in the same conversation, Headroom detects the duplicate and sends a compact reference stub instead of the full content. Expected savings: 40-60% on repeated reads.
Adaptive Tool Schema Compression — VS Code injects tool definitions on every request. Headroom intelligently compresses tool descriptions based on available context budget:
- Conservative mode (< 50% budget): Full descriptions preserved (200-100 char limits)
- Balanced mode (50-75% budget): Moderate truncation for performance (150-80 char limits)
- Aggressive mode (75-90% budget): Heavy truncation for cost savings (80-40 char limits)
- Critical mode (> 90% budget): Maximum compression to stay within limits (50-25 char limits)
Expected savings: 40-60% depending on aggressiveness level and context pressure.
Intelligent Image Compression — For multimodal models, Headroom optimizes image transmission:
- Automatic format optimization (JPEG, PNG, WebP support)
- Adaptive image quality scaling based on context pressure
- Base64 encoding optimization for inline image transmission
- Fallback image handling for models without vision capabilities
- Expected savings: 20-35% on image token usage
Conversation History Management — Older messages are intelligently summarised while preserving key context and decision points.
Smart Context Caching — Frequently-used context patterns are cached and referenced via compact identifiers to avoid re-transmission in long conversations.

Configuration

{
  "cloudLmProvider.headroom.enabled": true,
  "cloudLmProvider.headroom.aggressiveness": "conservative",
  "cloudLmProvider.headroom.showSavingsNotifications": true,
  "cloudLmProvider.headroom.largeFileThreshold": 500
}

Setting	Values	Default	Description
`enabled`	`true`/`false`	`true`	Master on/off switch
`aggressiveness`	`conservative` \| `balanced` \| `aggressive` \| `critical`	`conservative`	How much to compress; conservative for safer defaults
`showSavingsNotifications`	`true`/`false`	`true`	Milestone toasts (1M tokens / $5 saved)
`largeFileThreshold`	100–5000	`500`	Lines before large-file summary kicks in

Safety Guarantees

Source code is never lossy-compressed. Headroom uses pattern-based stripping only — no LLM rewriting of your code.
Compression failures fall back to uncompressed. A broken compression result never reaches the API.
Tool call/result pairs are validated before and after compression. Orphaned blocks are auto-removed to prevent ValidationException.
Detailed logging for debugging. Enable debug mode to see compression breakdown, token savings per content type, context pressure metrics, and transform tracking.
Context-aware adaptive compression. Automatically adjusts compression aggressiveness based on available token budget to prevent exceeding model limits.
Image integrity preserved. Vision model images are optimized, never corrupted or misrepresented.

Real Savings Dashboard

Headroom Dashboard

Track your savings in real-time:

💰 Cost Avoided — Exact dollar amounts saved
📊 Token Reduction — Session, daily, and lifetime metrics
📈 Compression History — Visual trends over time
🎯 Accuracy Index — Model attention improvement score

"Headroom saved us $2,400/month across our 50-person engineering team." — Senior Platform Engineer, Fortune 500 Company

Milestone Notifications

Headroom shows a brief toast when you hit a savings milestone:

🎉 Headroom: 1,000,000 tokens saved ($3.00 avoided)    [Open Dashboard]

Milestones fire at every 1M tokens or $5 saved (overall), then auto-dismiss after 5 seconds. Disable via cloudLmProvider.headroom.showSavingsNotifications.

📊 Real-Time Status Bar Telemetry

Always know exactly what you're spending:

$(graph) Tokens: 8,542 In / 1,247 Out | ⚡↓32% | $0.45

Live token counts from actual AWS/Azure API responses
Per-request cost calculation using real-time pricing
Cumulative session tracking for budget management
One-click dashboard access for detailed analytics

🎯 Smart Notifications & UI

Receive intelligent notifications about your usage and costs:

💰 Milestone Notifications — Toast alerts when reaching 1M token or $5 saved milestones
⚠️ Cost Warnings — Automatic alerts for expensive model usage
📈 Usage Trends — Visual indicators showing compression effectiveness
🔔 Status Updates — Real-time feedback on model availability and connection status
🎨 Visual Feedback — Compression indicator in status bar showing savings percentage
📊 Dashboard Analytics — Comprehensive savings dashboard with daily/lifetime metrics

🔧 Advanced Capabilities

Feature	Description
🔄 Full Streaming	Real-time token-by-token response rendering with smooth output
🛠️ Tool Calling	Function calling with automatic schema translation and validation
🖼️ Vision Support	Send images to multimodal models (Claude, GPT-4o) with intelligent compression
🌍 Cross-Region Routing	Automatic failover across AWS regions for model availability
🔐 Secure Credentials	Stored in VS Code's encrypted secret storage; no plaintext storage
⚙️ Inference Profiles	Support for AWS Bedrock inference profiles and model routing
📝 Request Logging	Detailed debug logs with compression breakdowns and performance metrics
🧮 Token Counting	Real-time token consumption tracking with cost breakdown
💾 Context Caching	Intelligent memory store for reducing redundant API calls
🎯 Auto Discovery	Automatic detection of AWS and Azure credentials and available models

🎛️ Configuration

AWS Bedrock Setup

Option 1: AWS CLI Profile (Recommended)

{
  "cloudLmProvider.aws.defaultRegion": "us-east-1",
  "cloudLmProvider.aws.modelRouting": "auto"
}

The extension automatically uses your configured AWS CLI profile.

Option 2: Access Keys

Run the configuration wizard and enter:

AWS Access Key ID
AWS Secret Access Key
(Optional) Session Token for temporary credentials

Option 3: IAM Role / Instance Profile

For EC2 or ECS environments, credentials are automatically discovered.

Azure OpenAI Setup

{
  "cloudLmProvider.azure.defaultDeployment": "gpt-4o",
  "cloudLmProvider.azure.apiVersion": "2025-01-01-preview"
}

Run the wizard and enter:

Azure OpenAI Endpoint URL
API Key or use Azure AD authentication

All Settings

Setting	Default	Description
`cloudLmProvider.aws.defaultRegion`	`us-east-1`	Primary AWS region
`cloudLmProvider.aws.modelRouting`	`auto`	Cross-region inference routing
`cloudLmProvider.aws.showAllRegions`	`false`	Show models from all regions
`cloudLmProvider.aws.enabledModelFamilies`	`[]`	Filter to specific model families
`cloudLmProvider.aws.minContextWindow`	`0`	Minimum context size filter
`cloudLmProvider.aws.hideExpensiveModels`	`false`	Hide high-cost models
`cloudLmProvider.enableCostWarnings`	`true`	Show cost alerts for expensive models
`cloudLmProvider.requestTimeoutMs`	`120000`	Request timeout (5s-600s)
`cloudLmProvider.logLevel`	`info`	Output verbosity

🤖 Supported Models

AWS Bedrock

Model	Context	Best For	Cost Tier
Claude 4.5 Opus	200K	Complex reasoning, code generation	💎💎💎
Claude 4.5 Sonnet	200K	Balanced performance & cost	💎💎
Claude 3.7 Sonnet	200K	Previous gen, battle-tested	💎💎
Claude 3.5 Haiku	200K	Fast, cost-effective	💎
Amazon Nova Pro	300K	AWS-native, large context	💎💎
Amazon Nova Lite	300K	Budget-friendly AWS model	💎
DeepSeek R1	64K	Advanced reasoning	💎💎
Llama 3.3 70B	128K	Open-source powerhouse	💎
Mistral Large	128K	European AI excellence	💎💎

Azure OpenAI

Model	Context	Best For	Cost Tier
GPT-4o	128K	Multimodal, fast	💎💎
GPT-4 Turbo	128K	Large context tasks	💎💎💎
o1	128K	Advanced reasoning	💎💎💎
GPT-3.5 Turbo	16K	Quick tasks, low cost	💎

🔒 Security & Compliance

Cloud LM Provider is built for enterprise environments:

✅ No Data Collection — We don't collect, store, or transmit your conversations
✅ Local Credential Storage — All secrets stored in VS Code's encrypted keychain
✅ Your Cloud, Your Data — Direct API calls to your AWS/Azure accounts
✅ SOC 2 / HIPAA Compatible — Works within your existing compliance framework
✅ Open Source — Audit the code yourself on GitHub

📈 Performance Benchmarks

Tested on a MacBook Pro M3 with VS Code 1.104:

Metric	Cloud LM Provider	Alternative Extensions
Cold Start	1.2s	3-5s
Model Switch	<100ms	500ms-2s
First Token	Network latency only	+200-500ms overhead
Memory Usage	~45MB	80-150MB
Token Compression	30-45% savings	N/A

❓ FAQ

Q: Do I need a GitHub Copilot subscription?

Yes, you need an active GitHub Copilot subscription to use GitHub Copilot Chat. This extension adds additional AI models to the existing Copilot Chat interface.

Q: Why are my AWS models not showing up?

Ensure your AWS credentials have bedrock:InvokeModel and bedrock:ListFoundationModels permissions
Check that the models are available in your selected region
Run "Cloud LM: Recheck Cloud Connection" to refresh

Q: How does Headroom compression work?

Headroom analyzes your conversation context and:

Deduplicates repeated tool schemas
Compresses older conversation history
Optimizes code blocks using AST-aware chunking
Caches frequently-used context patterns

This reduces token count by 30-45% without losing important context.

Q: Is my data secure?

Absolutely. The extension makes direct API calls from your machine to your cloud provider. We never proxy, store, or access your data. Credentials are stored in VS Code's encrypted secret storage.

Q: Can I use this with multiple AWS accounts?

Yes! Use AWS CLI profiles or switch credentials via the configuration wizard. The extension supports multiple credential sets.

Q: Why is Claude/GPT not responding?

Check your API quota limits in AWS/Azure console
Verify credentials haven't expired
Check the output log: "Cloud LM: Show Output Log"
Ensure the model is available in your region

🛠️ Commands

Command	Description
`Cloud LM: Manage Provider Configuration`	Open the interactive setup wizard for AWS Bedrock and Azure OpenAI
`Cloud LM: Recheck Cloud Connection`	Refresh model discovery and verify credentials
`Cloud LM: Clear Stored Credentials`	Remove all saved credentials from secure storage
`Cloud LM: Show Output Log`	View detailed debug logs including compression analytics
`Cloud LM: Toggle Headroom Context Compression`	Enable/disable Headroom compression in real-time
`Cloud LM: Show Headroom Savings Dashboard`	View comprehensive savings analytics and compression history
`Cloud LM: Manage Accounts`	Switch between multiple AWS/Azure accounts
`Cloud LM: Verify Model Access`	Test connectivity to specific models
`Cloud LM: Export Compression Report`	Generate CSV/JSON report of compression metrics

🗺️ Roadmap

[ ] Prompt Library — Save and reuse effective prompts
[ ] Team Sharing — Share configurations across your organization
[ ] Cost Alerts — Configurable spending notifications
[ ] Google Vertex AI — Support for Gemini models
[ ] Local Models — Ollama and LM Studio integration
[ ] Custom Endpoints — OpenAI-compatible API support

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Clone the repository
git clone https://github.com/suddhu-iith2004/cloud-lm-provider.git

# Install dependencies
npm install

# Compile and watch
npm run watch

# Launch Extension Development Host
F5 in VS Code

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

🙏 Acknowledgments

AWS SDK for JavaScript — AWS Bedrock integration
Azure SDK for JavaScript — Azure OpenAI integration
Headroom AI — Token compression engine
The VS Code team for the excellent Language Model API

⭐ If Cloud LM Provider saves you time and money, please star this repo! ⭐

Made with ❤️ by @suddhu-iith2004

📊 Keywords

aws bedrock azure openai github copilot claude gpt-4 llm language model ai assistant code generation token compression enterprise ai vscode extension copilot chat anthropic openai amazon nova deepseek llama mistral cost optimization api cost token tracking headroom context compression