Skip to content
| Marketplace
Sign in
Visual Studio Code>Other>Nesha - AI Tab CompletionNew to Visual Studio Code? Get it now.
Nesha - AI Tab Completion

Nesha - AI Tab Completion

MouleeswaranR

| (0) | Free
AI-powered inline tab completion using local context, cross-file symbols, and fast LLMs via OpenRouter, Groq, or Fireworks.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Tab Complete

🚀 Extension — Coming Soon on VS Code Marketplace

Tab Complete is a VS Code extension that delivers fast, context-aware AI-powered inline (ghost-text) code completions — completely free, using the best open-weight models available today. It streams suggestions from top free-tier LLM providers (OpenRouter, Groq, Fireworks) that offer powerful models at no cost, and squeezes maximum performance out of them through a multi-stage intelligent ecosystem: smart context building, AST analysis, LSP integration, multi-layer caching, and edit-intent tracking — all working together so even a free model feels like a premium coding assistant.

No paid API required. OpenRouter, Groq, and Fireworks all offer generous free tiers. Tab Complete is designed from the ground up to get the most out of free models by sending them only the most relevant context — so you get better completions with fewer tokens.

🚀 Coming Soon on VS Code Marketplace — stay tuned!


Table of Contents

  1. How It Works — Overview
  2. Feature 1 — AI-Powered Inline (Ghost-Text) Completions
  3. Feature 2 — Smart Context Builder (Prefix Stage)
  4. Feature 3 — AST-Powered Replacement Region
  5. Feature 4 — Multi-Layer Completion Caching
  6. Feature 5 — Continue-Prediction (Type-Ahead Shortcutting)
  7. Feature 6 — Edit Intent Tracker
  8. Feature 7 — LSP-Aware Local Dependency Resolution
  9. Feature 8 — Multi-Provider LLM Streaming
  10. Feature 9 — Live Configuration with Hot-Reload
  11. Feature 10 — Completion De-Duplication
  12. Feature 11 — Cross-File Symbol Context
  13. Extension Settings Reference
  14. Supported Languages
  15. Getting Started
  16. Architecture Diagram

How It Works — Overview

When you stop typing for ~300 ms, Tab Complete's provider pipeline kicks in:

User stops typing
       ↓
[Debounce 300ms]
       ↓
[Stage 1] Re-use existing pending suggestion? → YES → return it immediately
       ↓ NO
[Stage 2] Cache hit? → YES → return cached suggestion instantly
       ↓ NO
[Stage 3] User typed part of last prediction? → YES → return remaining suffix instantly
       ↓ NO
[Stage 4] Build smart context prefix (PrefixStage + AST + LSP)
       ↓
[Stage 5] Stream completion from LLM (OpenRouter / Groq / Fireworks)
       ↓
[Stage 6] Store result in cache
       ↓
Display ghost-text inline suggestion

This layered pipeline means most suggestions are served from fast local stages (stages 1–3) and only fall through to the LLM when genuinely needed. This is also what makes Tab Complete excel with free models — by the time the LLM is called, it receives a surgically precise context rather than a raw file dump, so even smaller free models produce highly accurate completions.


Feature 1 — AI-Powered Inline (Ghost-Text) Completions

What it does

Tab Complete registers a native VS Code InlineCompletionItemProvider for every file type (pattern **). As you edit, it shows grey ghost-text suggestions that you can accept by pressing Tab.

How it works internally

The provider (InlineCompletionProvider) implements VS Code's provideInlineCompletionItems API. It:

  1. Receives the current document and cursor position from VS Code.
  2. Runs through the full pipeline (debounce → cache → prediction → context → LLM).
  3. Returns a vscode.InlineCompletionList containing a single InlineCompletionItem with the suggested text and an explicit insert range (the replacement region).

Example

You type:

const result = users.filter(u => u.age >

Tab Complete sends your file context to the LLM and returns:

 30).map(u => u.name);

The suggestion appears as grey ghost-text. Press Tab to accept.

Debounce

The provider waits 300 ms after your last keystroke before doing anything expensive. If you type again before 300 ms, the previous request is discarded and a new one starts. This prevents API spam while you are still typing.


Feature 2 — Smart Context Builder (Prefix Stage)

What it does

Instead of blindly sending the entire file to the LLM (which wastes tokens and adds noise), Tab Complete builds a tailored prefix that includes exactly the code most relevant to what you are writing.

Three strategies based on cursor position

Strategy A — Verbatim prefix (cursor line < 150)

For small files or when you are near the top, the full document text up to the cursor is sent. This is always accurate and fast.

[Full file top] → ... → [cursor position]

Strategy B — Simplified prefix (cursor line ≥ 150, no enclosing function)

When the cursor is deep in a file but not inside a function, the extension:

  • Takes the last 150 lines before the cursor (the most recent context).
  • Extracts which identifiers are used in those lines.
  • Filters the import section to only include imports that are actually referenced.

This prevents sending hundreds of stale lines at the top of the file.

Strategy C — Scoped prefix (cursor inside a function)

This is the most powerful strategy. It assembles:

Part What it contains
Used imports Only the import statements whose bindings appear in the current scope
Same-file dependencies Full source of functions/classes used by the current function (resolved via LSP)
Class header The class name, extends, implements, field declarations (up to the opening {)
Function lines All lines from the function start to the cursor position

Small function (< 150 lines): all lines from the function start to cursor are included.

Large function (≥ 150 lines): only the first 30 lines (setup/signature) and the last 100 lines before the cursor are included, with a // ... [truncated] ... marker between them. This keeps the LLM focused on what it needs without exceeding token limits.

Example — Scoped prefix output

// Used imports
import { UserService } from './userService';
import { formatName } from '../utils/format';

// Same-file dependency (resolved automatically)
function validateAge(age: number): boolean {
    return age >= 0 && age <= 150;
}

// Class header
class ProfileController extends BaseController {
    private readonly userService: UserService;

    // Function lines up to cursor
    async updateProfile(userId: string, data: ProfileData): Promise<void> {
        const user = await this.userService.findById(userId);
        if (!validateAge(data.age)) {
            throw new Error('Invalid age');
        }
        // ← cursor is here

The LLM receives exactly this — no noise, no stale code, no unrelated imports.


Feature 3 — AST-Powered Replacement Region

What it does

Tab Complete doesn't just complete forward — it computes a replacement region: the span of existing text that the new completion should replace. This is especially important for mid-line completions where you are rewriting part of an existing expression.

How it works

The ReplacementRegionStage uses Tree-sitter to parse the code and find the end of the current statement, enabling accurate replacements even across multiple lines.

Step 1 — Find text after cursor
It reads everything on the current line after the cursor position.

Step 2 — Decide if extension is needed
Extension is triggered when the text after the cursor has:

  • Unbalanced open brackets/braces (e.g. func(a, b — unclosed ()
  • A continuation operator at the end (,, +, &&, |>, ?, ., =>, etc.)
  • No statement terminator (}, ), {, :)

Step 3 — Extend to statement end using AST
If extension is needed and the text after cursor is short (< 20 chars), the stage:

  1. Reads up to 3 more lines.
  2. Parses them with Tree-sitter.
  3. Walks the AST to find the smallest node containing the cursor.
  4. Walks up the tree to the nearest statement boundary (expression_statement, return_statement, variable_declaration, if_statement, etc.).
  5. Returns the end position of that statement node.

Example

Cursor is in the middle of:

result = compute_value(x,
    y + z)

Without AST, the replacement region would be just compute_value(x, — cutting off mid-expression. With AST, the replacement region correctly spans both lines up to ).

Supported languages for AST

TypeScript, TSX, JavaScript, JSX, Python, Rust, Go, Java, C, C++


Feature 4 — Multi-Layer Completion Caching

What it does

Tab Complete caches completion results so that when you return to the same position in the same state, the suggestion appears instantly without hitting the LLM again.

Cache key

Each cached entry is keyed on 5 dimensions:

Key part Why it matters
Document URI Different files have different completions
Document content hash (MD5) Detects if the file has been edited since the last cache entry
Cursor line Position matters — different lines need different completions
Cursor character Sub-line position affects the completion
Edit history hash Your recent edits change what the LLM should suggest

If any of these change, the cache misses and a fresh LLM call is made.

Underlying cache — BoundedCache (LRU + LFU)

The BoundedCache<V> is a generic, fixed-capacity cache that combines LRU (Least Recently Used) and LFU (Least Frequently Used) eviction:

  • Each entry tracks lastAccessed timestamp and accessCount.
  • Eviction score = accessCount / ageInSeconds — entries with low use frequency and old access time are evicted first.
  • TTL support — each entry can have an expiry time (configurable via completionCacheTtlMs).
  • Group invalidation — all entries for a closed document are removed at once when VS Code fires onDidCloseTextDocument.

Live config update

If you change completionCacheMaxEntries or completionCacheTtlMs in settings, the cache rebuilds itself immediately without requiring a reload.


Feature 5 — Continue-Prediction (Type-Ahead Shortcutting)

What it does

When the LLM returns a multi-token suggestion but you start typing characters that match the beginning of that suggestion, Tab Complete skips the LLM entirely and just shows you the remaining suffix of the already-predicted text.

How it works

After a successful LLM completion, the extension remembers:

  • lastCompletionText — the full predicted string
  • lastCompletionPosition — where the prediction started
  • lastCompletionUri — which file it was in

On the next provider call (before any cache or LLM lookup), it computes what text you have typed since the last prediction position, then checks if that typed text is a prefix of lastCompletionText.

Situation What happens
Typed text is a prefix of prediction Return the remaining suffix as new suggestion
Typed text fully matches prediction Clear prediction state, return null (accept silently)
Typed text diverged Clear prediction state, fall through to full re-prediction

Example

LLM suggests: users.filter(u => u.isActive).length

You type users.filter( — Tab Complete immediately shows u => u.isActive).length without any API call.


Feature 6 — Edit Intent Tracker

What it does

The IntentTracker listens to every document change in VS Code and builds a rolling history of what you have been doing. This history is hashed and used as part of the completion cache key, ensuring completions are context-sensitive to your recent activity.

How it works

Listening to changes
The tracker registers two listeners:

  • onDidChangeTextDocument — fires on every keystroke/paste/undo.
  • onDidChangeActiveTextEditor — fires when you switch files.

Grouping edits into intents
Rapid consecutive edits to the same file are grouped into a single PendingIntent. A pending intent is flushed (finalised) 1.5 seconds after the last activity in that group.

Classifying edit type
Each finalised intent is classified:

Type Condition
'pasted' A single change inserted more than 50 characters
'added' New text was inserted
'edited' Text was replaced or deleted

Version jump detection
If the document version jumps by more than 1 between events (e.g. undo/redo sequences), the pending intent for that file is discarded to avoid tracking stale context.

Buffer and hash
Finalised IntentEntry objects accumulate in a rolling buffer. At any point, computeHash() produces a short (16-char) MD5 fingerprint of the buffer content (file path + timestamp + type + change text). This hash invalidates the completion cache whenever your recent edit context changes.

Example

You paste a large block of code → intent classified as 'pasted', hash changes → next completion call is a cache miss → LLM is called with fresh context that knows what you just pasted.


Feature 7 — LSP-Aware Local Dependency Resolution

What it does

When building the scoped prefix, Tab Complete uses VS Code's built-in Language Server Protocol (LSP) commands to resolve definitions of identifiers used at the cursor and automatically pulls the full source of same-file symbols into the context.

How it works

Step 1 — Extract identifiers
The extractIdentifiers utility uses regex-based heuristics tuned per language to collect all symbol names used in the current scope (class header + function lines).

Step 2 — Resolve definitions via LSP
For each identifier, LSPService.executeDefinitionProvider() is called (equivalent to pressing F12 in VS Code). This returns the file URI and position of the definition.

Step 3 — Filter to same-file symbols
Only definitions in the same file are included — external libraries are already covered by import statements.

Step 4 — Retrieve full symbol text
For each resolved same-file symbol, the document symbol provider is used to find the full range of the function or class definition, and its complete source text is extracted.

LSP caching
All LSP calls are cached in a BoundedCache keyed by document URI + position. The cache is invalidated when the document changes (onDidChangeTextDocument) or is closed (onDidCloseTextDocument). If you change lspCacheMaxEntries in settings, the cache rebuilds itself live.

Example

Your function calls validateAge and formatDate — both defined elsewhere in the same file. Tab Complete automatically includes their full source in the prefix, so the LLM knows their signatures and behaviour when generating your completion.


Feature 8 — Multi-Provider LLM Streaming

What it does

Tab Complete supports three LLM providers and automatically selects whichever one has an API key configured. All providers use an identical OpenAI-compatible streaming API, and completions are streamed token-by-token for the fastest possible first-token latency.

Providers

All three supported providers offer free tiers — no credit card required to get started:

Provider Default model Free tier Notes
OpenRouter qwen/qwen3-32 ✅ Free models available Default and recommended — access to hundreds of open-weight models, many completely free
Groq qwen/qwen3-32 ✅ Free tier included Ultra-fast inference on dedicated hardware — often the fastest free option
Fireworks qwen/qwen3-32 ✅ Free tier included Fast serverless inference for open-weight models

Provider selection — The extension checks keys in this order: OpenRouter → Groq → Fireworks. The first provider with a non-empty API key is used.

Streaming mechanics

  1. A fetch POST is sent with "stream": true to the provider's chat completions endpoint.
  2. The response body is read chunk-by-chunk using the Streams API (ReadableStream.getReader()).
  3. Each chunk is decoded (TextDecoder) and split on newlines into SSE data lines.
  4. Lines starting with data: are parsed as ChatStreamChunk (OpenAI SSE format).
  5. choices[0].delta.content from each chunk is yielded from an AsyncGenerator<string>.
  6. Streaming stops on the [DONE] sentinel.

Cancellation

Every in-flight request is tracked with an AbortController. When:

  • VS Code cancels the completion token (user typed again)
  • A newer request supersedes the current one

…the AbortController.abort() is called immediately, terminating the fetch and stopping streaming. This ensures no stale completions arrive from old requests.

LLM prompt

The LLM is given a tightly constrained system prompt:

You are a code autocomplete engine.
Rules:
- Complete the current line of code
- Return meaningful continuation (not single characters)
- Do not return partial tokens
- Prefer full expressions like function calls
- No explanations

Temperature is set to 0.1 for deterministic, code-appropriate outputs.


Feature 9 — Live Configuration with Hot-Reload

What it does

All settings are applied immediately when you change them in VS Code's settings UI — no reload required. The ConfigurationService singleton notifies all subscribed components (cache, LSP service, API client) so they adapt in real time.

How it works

  • ConfigurationService uses vscode.workspace.onDidChangeConfiguration to detect changes in the tab-completion namespace.
  • When a change is detected, loadConfig() is called to re-read all values.
  • Each registered listener (subscriber) is called with the new config object.
  • ApiClient, CompletionCache, and LSPService all subscribe and react:
    • CompletionCache rebuilds itself with a new BoundedCache if size or TTL changed.
    • LSPService rebuilds its cache if max entries changed.
    • ApiClient always reads the live config on each request — no state to update.

Feature 10 — Completion De-Duplication

What it does

Before showing any suggestion to the user, the DeDuplicationService runs three independent checks to ensure the completion does not repeat code that already exists in the file — either above or below the cursor. This prevents double-insertions and the jarring experience of the LLM echoing back lines you have already written.

Three-layer duplication checks

Layer 1 — Lookbehind overlap trim

Compares the beginning of the completion against the lines above the cursor (up to 200 lines). If the completion starts with lines that already exist above, those lines are stripped from the completion — only the genuinely new continuation is kept.

Existing code (above cursor):
  4|   const b = 2;  ← cursor at end of this line

Model returns:
  "  const b = 2;\n  const c = 3;"

After trim:
  "  const c = 3;"   ← only the new part is kept

It also handles the prefix-merge special case: if the cursor is mid-line, it checks whether the first line of the completion merges naturally with the text already typed on the cursor line.

Layer 2 — Structural overlap check

Compares the completion against the code below the cursor (lookahead). If 2 or more consecutive lines of the completion match 2 or more consecutive lines already in the lookahead (with ≥ 0.85 similarity), the completion is rejected entirely.

Code below cursor (lookahead):
  "  console.log(x);\n  return x;\n}"

Model returns:
  "  console.log(x);\n  return x;\n}\n"

→ Two consecutive matching lines found → reject

This check uses normalised text comparison (whitespace-collapsed) and fuzzy similarity scoring so that minor formatting differences do not cause false negatives.

Layer 3 — Trailing overlap check

Checks whether the end of the completion duplicates the start of the lookahead. This catches the common case where the model correctly completes a function body but also repeats the closing } that already exists on the next line.

Code below cursor:
  10|   return x;
  11| }

Model returns:
  "  return x;\n}\n"

→ Last line "}" matches the next existing line "}" → trailing overlap → reject

The check works by collecting up to 5 trailing non-empty lines from the completion and comparing them against up to 100 leading non-empty lines of the lookahead.

Normalisation

All comparisons use normalizeText — which collapses whitespace and strips comments — and stringSimilarity — a character-level similarity ratio — so the checks are robust to indentation differences and minor reformatting.


Feature 11 — Cross-File Symbol Context

What it does

The CrossFileService enriches completions with type signatures and declarations from other files in your workspace. When the code near your cursor references a class, interface, function, or type from a different file, that symbol's signature is automatically included in the context sent to the LLM — so it can generate completions that correctly use those external types without you having to paste anything manually.

How it works

The pipeline has three stages:

Stage 1 — Reference Extraction (ReferenceExtractor)

Scans the last 15 lines of the prefix (the "nearby text") to find identifiers that are:

  • Used in the nearby code
  • Imported from another file (not declared locally in the current prefix)

It parses import statements to build an alias map (e.g. import { foo as Bar } → original name is foo), then resolves aliases back to their original names. Only identifiers that appear in imports but are not locally declared are treated as cross-file references.

Stage 2 — Symbol Index (SymbolIndex)

Maintains a workspace-wide index of document symbols, built using VS Code's LSP executeDocumentSymbolProvider command. The index is kept up to date by listening to:

  • onDidSaveTextDocument — re-indexes a file when saved
  • onDidOpenTextDocument — indexes a file when first opened

Each entry is cached by document URI and version, so re-indexing only happens when the file actually changes. Supported symbol kinds: Class, Interface, Enum, Function, Method, Property, Constant, TypeParameter, Struct.

Stage 3 — Signature Extraction (SignatureProvider)

For each cross-file symbol that matches a reference in the current prefix, the provider:

  1. Opens the symbol's source document.
  2. Extracts the text of the symbol's range.
  3. Parses it with Tree-sitter (ASTService) to extract just the signature (not the full body).
  4. Caches the result by symbol URI + location (so the same symbol is never re-parsed).

The extracted signatures are attached to the CompletionContext as crossFileSymbols and included in the LLM prompt.

Example

Your file imports and uses ApiClient from ./api/apiClient.ts:

import { ApiClient } from './api/apiClient';
// ...
const response = await this.apiClient.  // ← cursor here

Cross-file context automatically adds:

// Cross-file symbol: ApiClient
class ApiClient {
    constructor(outputChannel: vscode.OutputChannel)
    async streamCompletion(messages: ChatMessage[], token: vscode.CancellationToken): AsyncGenerator<string>
}

The LLM now knows exactly what methods ApiClient exposes and generates the correct completion.

Caching

Signature results are cached in a BoundedCache<string> (capacity 1000) keyed by symbol URI + kind + name + range. Entries are grouped by file URI so all signatures from a file are invalidated together when that file is saved.


Extension Settings Reference

Configure under Settings → Tab Completion or in settings.json:

Setting Type Default Description
tab-completion.openrouterApiKey string "" API key for OpenRouter (recommended default provider)
tab-completion.groqApiKey string "" API key for Groq
tab-completion.fireworksApiKey string "" API key for Fireworks
tab-completion.model string "qwen/qwen3-32" LLM model name to use for completions
tab-completion.maxTokens number 500 Maximum tokens to generate per completion (50–5000)
tab-completion.completionCacheMaxEntries number 100 Maximum number of completions to keep in cache (10–1000)
tab-completion.completionCacheTtlMs number 30000 How long (in milliseconds) a cached completion stays valid (5000–120000)
tab-completion.lspCacheMaxEntries number 100 Maximum number of LSP results to cache (10–1000)

Minimal setup — settings.json

{
    "tab-completion.openrouterApiKey": "sk-or-...",
    "tab-completion.model": "qwen/qwen3-32",
    "tab-completion.maxTokens": 500
}

Supported Languages

Full AST-powered replacement regions and scoped prefix support:

Language Grammar
TypeScript ✅
TypeScript React (TSX) ✅
JavaScript ✅
JavaScript React (JSX) ✅
Python ✅
Rust ✅
Go ✅
Java ✅
C ✅
C++ ✅

All other languages receive verbatim or simplified prefix completions (no AST, but still fully functional).


Getting Started

1. Install the extension

🚀 Coming Soon on VS Code Marketplace — stay tuned!

Once released, install directly from the VS Code Extensions panel by searching "Tab Complete".

2. Get a free API key

All providers below have free tiers — pick one and create a free account:

  • OpenRouter (recommended — free models available): openrouter.ai/keys
  • Groq (free tier, very fast): console.groq.com/keys
  • Fireworks (free tier): fireworks.ai

3. Add your API key to VS Code settings

Open Settings (Ctrl+,) → search Tab Completion → paste your key.

Or add to settings.json:

{
    "tab-completion.openrouterApiKey": "your-key-here"
}

4. Start coding

Open any file, start typing, and wait ~300 ms. Ghost-text suggestions will appear automatically. Press Tab to accept.


Architecture Diagram

extension.ts (entry point)
│
└── InlineCompletionProvider          ← registers for all files (**)
        │
        ├── ApiClient                 ← streams completions from OpenRouter / Groq / Fireworks
        │       └── AbortController   ← cancels stale requests instantly
        │
        ├── IntentTracker             ← records edit history, produces cache-invalidation hash
        │
        ├── CompletionCache           ← LRU+LFU bounded cache (keyed by content + position + intent hash)
        │       └── BoundedCache<V>   ← generic fixed-capacity cache with TTL and group invalidation
        │
        ├── DeDuplicationService      ← three-layer overlap filter (lookbehind / structural / trailing)
        │
        └── ContextGatherer           ← coordinates context building
                │
                ├── PrefixStage       ← smart prefix: verbatim / simplified / scoped
                │       ├── Import filter           ← only used imports
                │       ├── LocalDependencyResolver ← same-file symbol resolution via LSP
                │       └── LSPService              ← wraps VS Code LSP commands with caching
                │
                ├── SuffixStage       ← captures closing brackets after the replacement region
                │
                ├── ReplacementRegionStage ← AST-powered replacement range calculation
                │       └── ASTService    ← Tree-sitter parser (TypeScript, Python, Rust, Go, Java, C, C++)
                │
                └── CrossFileService  ← enriches context with symbols from other workspace files
                        ├── ReferenceExtractor ← finds imported identifiers used near cursor
                        ├── SymbolIndex        ← LSP-based workspace symbol index (auto-updated on save/open)
                        └── SignatureProvider  ← AST-extracts signatures + caches by symbol location

ConfigurationService  ← singleton; hot-reloads all settings; notifies all subscribers

Release Notes

0.0.1

Initial release:

  • AI-powered inline completions via OpenRouter, Groq, and Fireworks
  • Smart scoped/simplified/verbatim prefix builder
  • AST-powered replacement region using Tree-sitter
  • Suffix stage — captures closing brackets after the replacement region
  • Multi-layer completion caching (LRU+LFU with TTL)
  • Continue-prediction type-ahead shortcutting
  • Edit intent tracking with cache invalidation
  • LSP-aware local dependency resolution
  • Multi-provider streaming with cancellation
  • Live configuration hot-reload
  • Three-layer de-duplication (lookbehind trim, structural overlap, trailing overlap)
  • Cross-file symbol context (workspace symbol index + AST signature extraction)

License

MIT

Following extension guidelines

Ensure that you've read through the extensions guidelines and follow the best practices for creating your extension.

  • Extension Guidelines

Working with Markdown

You can author your README using Visual Studio Code. Here are some useful editor keyboard shortcuts:

  • Split the editor (Cmd+\ on macOS or Ctrl+\ on Windows and Linux).
  • Toggle preview (Shift+Cmd+V on macOS or Shift+Ctrl+V on Windows and Linux).
  • Press Ctrl+Space (Windows, Linux, macOS) to see a list of Markdown snippets.

For more information

  • Visual Studio Code's Markdown Support
  • Markdown Syntax Reference

Enjoy!

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft