tokencut

tokencut is a local-first caching layer for GitHub Copilot in VS Code. It detects when you (or an agent) ask the same or a semantically similar question and serves the cached answer instantly — saving tokens, reducing latency, and keeping responses consistent.

All data stays on your machine. Nothing is sent to any external service.

How it works

You ask @tokencut a question in Copilot Chat.
tokencut checks its local cache:
- Exact match — identical (normalised) prompt → instant answer.
- Semantic match — similar meaning detected via a local all-MiniLM-L6-v2 embedding model → cached answer returned with a confidence score.
- Miss — falls through to the live Copilot model and stores the answer for next time.
Every cache hit is logged so you can track token savings over time.

Requirements

VS Code 1.90 or later
GitHub Copilot extension installed and signed in
Node.js 22 or later (needed to run the bundled service)

Installation

From the Marketplace (once published)

Search for tokencut in the VS Code Extensions panel and click Install. The bundled service starts automatically.

From source

git clone https://github.com/algorythmik/tokencut
cd tokencut/vscode-extension
npm install
npm run vscode:prepublish   # compiles TS + bundles the service
vsce package                # produces tokencut-x.y.z.vsix
code --install-extension tokencut-*.vsix

Usage

Chat participant

Open Copilot Chat and type:

@tokencut how do I run the tests?

Once you invoke @tokencut once in a conversation it stays selected — you don't need to type @tokencut again for follow-up messages.

Press Cmd+Shift+/ (Mac) or Ctrl+Shift+/ (Windows/Linux) to open the chat panel with @tokencut pre-filled.

Editor commands

Command	What it does
`tokencut: Explain Selection`	Explains the currently selected code, with semantic reuse across identical or near-identical selections
`tokencut: How Do I Build / Test / Run This Repo?`	Answers common repo-specific questions, cached per workspace

Access them via the Command Palette (Cmd+Shift+P / Ctrl+Shift+P).

Settings

Setting	Default	Description
`tokencut.serviceUrl`	`http://127.0.0.1:8787`	URL of the local tokencut service. Change only if you run the service manually on a different port.
`tokencut.forceFresh`	`false`	When `true`, always calls the live model and skips the cache. Useful for debugging.

Token savings

tokencut estimates tokens saved using the rule of thumb 1 token ≈ 4 characters. Each cached answer multiplied by how many times it was reused gives the estimate.

Check your live savings in the status bar at the bottom-right of VS Code:

⊙  1,234 tokens saved

Click it to see the full breakdown and write a timestamped snapshot to ~/.tokencut/stats.jsonl.

Analyse snapshots over time:

cat ~/.tokencut/stats.jsonl | jq -r '[.timestamp, .estimatedTokensSaved, .totalHits] | @csv'

Cache details

Request type	Cache TTL	Notes
Repo question	7 days	Scoped to workspace
Explain selection	30 days	Keyed to selected code content
Summarize file	1 day	Short TTL — file content changes often

The cache is a local SQLite database at vscode-extension/service/data/tokencut.db. Delete it at any time to start fresh.

Running the service manually

The service starts automatically with VS Code. If you want to run it separately (e.g. for development):

cd service
npm start

It listens on 127.0.0.1:8787 and only accepts connections from localhost.

Available endpoints:

Method	Path	Description
`GET`	`/health`	Liveness check
`POST`	`/v1/query`	Look up a cached answer
`POST`	`/v1/store`	Store a new answer
`GET`	`/v1/stats`	Hit/miss counts and token savings
`POST`	`/v1/stats/snapshot`	Write a snapshot to `~/.tokencut/stats.jsonl`

Privacy

Every prompt and answer is stored locally only in a SQLite file on your machine.
The service binds to 127.0.0.1 and rejects all non-localhost requests.
The embedding model (all-MiniLM-L6-v2, ~23 MB) is downloaded once to ~/.cache/huggingface/ and runs entirely offline after that.
Nothing is sent to any remote server by tokencut.

License

MIT

tokencut

Mojtaba Tabatabaeipour

tokencut

How it works

Requirements

Installation

From the Marketplace (once published)

From source

Usage

Chat participant

Editor commands

Settings

Token savings

Cache details

Running the service manually

Privacy

License