⚡ Ayu-Gravity — Token Compressor

Automatically compresses your prompts before sending — cutting token costs by 30–50% without losing meaning.

🧠 The Problem

Every token you send to an AI model costs money. Long, verbose prompts with filler words, redundancy, and conversational fluff can double your API bill without adding any value to the response.

Most developers don't realize that up to 40% of their prompt tokens are wasted on words the model doesn't need to understand your intent.

Ayu-Gravity solves this. It sits between you and the AI — compressing your prompts on-the-fly using a fast, cheap model before forwarding them to your primary expensive model. You type naturally. The AI receives a lean, optimized prompt. You save tokens. You save money.

✨ Features

Feature	Description
🗜️ Auto Compression	Prompts are compressed before sending — completely transparent to you
🤖 5 AI Providers	Anthropic, OpenAI, Gemini, Ollama, and GitHub Copilot built-in
🎛️ Model Selector	Choose your exact model per provider — from the cheapest to the most powerful
⚡ Streaming Responses	Real-time token-by-token streaming with Markdown rendering
🔒 Encrypted API Keys	All keys stored in VS Code's native SecretStorage — never in plaintext
💾 Export Chat	Save any conversation as a Markdown file
🔄 Auto-Retry	Exponential backoff on 429 rate limits — no crashes
🎨 Native VS Code UI	Matches your theme perfectly — dark mode, light mode, everything
📜 Chat History	Conversations persist across panel open/close cycles
🆓 Free Options	Use GitHub Copilot (zero config) or Ollama (local, free)

🔌 Supported Providers

Provider	Setup Required	Cost
GitHub Copilot	None — auto-detected if installed	Included with Copilot subscription
Ollama	Run Ollama locally on port 11434	Free (local inference)
Anthropic	Set API key via command palette	Pay per token
OpenAI	Set API key via command palette	Pay per token
Google Gemini	Set API key via command palette	Pay per token / Free tier available

🎯 Supported Models

Anthropic

Model	Best For
`claude-opus-4-20250514`	Complex reasoning, long-form analysis
`claude-sonnet-4-20250514`	Balanced speed and quality (default)
`claude-3-5-haiku-20241022`	Fast responses, cost-efficient

OpenAI

Model	Best For
`gpt-4o`	Most capable, multimodal (default)
`gpt-4o-mini`	Fast and affordable
`o1-mini`	Reasoning-focused tasks

Google Gemini

Model	Best For
`gemini-2.0-flash`	Ultra-fast, latest generation (default)
`gemini-1.5-pro`	Long context, complex tasks
`gemini-1.5-flash`	Speed-optimized

Ollama

Models are fetched dynamically from your local Ollama installation. Whatever you have pulled locally — Llama, Mistral, Phi, Qwen — it appears automatically in the dropdown.

GitHub Copilot

Model selection is automatic via the VS Code Language Model API. No configuration needed.

🔧 How It Works

┌──────────────┐     ┌──────────────────┐     ┌──────────────┐
│  You type a  │────▶│  Ayu-Gravity     │────▶│  AI Model    │
│  prompt      │     │  compresses it   │     │  responds    │
│  (100 tokens)│     │  (65 tokens)     │     │  (streamed)  │
└──────────────┘     └──────────────────┘     └──────────────┘

You type your prompt naturally in the sidebar chat
Token estimation — if your prompt is ≥ 20 estimated tokens, compression kicks in
Compression — a fast, cheap model (Haiku / gpt-4o-mini / Flash) strips filler words, redundancy, and verbose phrasing in under 10 seconds
Original + compressed — you see both in the UI with a toggle
Forwarding — the compressed prompt gets sent to your selected model
Streaming — the response streams back token-by-token with live Markdown rendering
Savings — you save 30–50% of input tokens on most prompts

📦 Installation

From VS Code Marketplace

Open VS Code
Go to Extensions (Ctrl+Shift+X)
Search for "Ayu-Gravity"
Click Install

From VSIX File

code --install-extension ayu-gravity-2.1.0.vsix

🔑 Setting Up API Keys

Open the Command Palette (Ctrl+Shift+P)
Run "Antigravity: Set API Key"
Select your provider (Anthropic / OpenAI / Gemini)
Paste your API key (input is hidden)
Done! The provider now appears in your dropdown

Note: API keys are stored in VS Code's native SecretStorage — encrypted at rest and never written to disk in plaintext.

No API Key Needed For:

GitHub Copilot — works automatically if you have a Copilot subscription
Ollama — runs locally, just have the Ollama server running on localhost:11434

🗜️ Compression Modes

Mode	Reduction Target	Best For
⚡ Balanced	~35%	Everyday use — good savings, full meaning preserved
🔥 Aggressive	~50%	Maximum savings — strips everything non-essential
💡 Light	~20%	Minimal compression — when precision matters

Tip: Compression is only applied to prompts with 20+ estimated tokens. Short prompts are sent as-is.

📸 Screenshots

Coming soon — screenshots of the sidebar panel, model selector, and compression in action.

❓ FAQ

Q: Does compression affect response quality? A: No. Compression removes filler words and redundancy — not meaning. LLMs actually perform better with concise, clear prompts.

Q: What model is used for compression? A: The cheapest, fastest model for each provider: claude-3-5-haiku for Anthropic, gpt-4o-mini for OpenAI, gemini-1.5-flash for Gemini. Compression cost is negligible.

Q: Can I see what was compressed? A: Yes! Every compressed message has a "View compressed prompt" toggle in the UI that shows you exactly what was sent.

Q: Does it work offline? A: Only with Ollama (local models). All other providers require internet access.

Q: Are my API keys safe? A: Yes. All keys are stored in VS Code's native encrypted SecretStorage. They are never logged, transmitted, or written to plaintext files.

Q: What happens if compression fails? A: The original prompt is sent as-is. Compression has a 10-second timeout and graceful fallback — it never blocks your message.

Q: Can I choose which model to chat with? A: Yes! The model selector dropdown lets you pick any available model for your selected provider.

📄 License

MIT License — free for personal and commercial use.

Built with ⚡ by akashchauhan1001

Stop wasting tokens. Start using Ayu-Gravity.

Ayu TokPress

akash chauhan 1001