Gneiss — AI Writing Assistant
A VSCode extension where you control the argument's skeleton and delegate low-level tasks for AI agents that 'fill in the blanks'. Targetted towards academic, formal, and research writing — Gneiss works with LaTeX, Markdown, and any other file type. Our vision is to recentre humans in the writing process, integrating LLMs as 'blanks' in your text so you can write at the speed of thought without sacrificing control.
For example, as you being drafting the motivation for a paper, you might write:
The seminal result by ⟨citation⟩ introduced the transformer architecture, which ⟨one-sentence summary of key contribution⟩.
Then press Option+S (to fill in blanks sequentially in a token-efficient manner). Gneiss automatically delegates agents to complete these blanks with smart context construction: agents can browse the web, query files in your repo, and reference bibtex citations to fill in your blanks (for Gneiss' first-class file access and bibtex support, see here).
When we ran Gneiss on the above example, it produced:
The seminal result by Vaswani et al. (2017) introduced the transformer architecture, which is a network architecture based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
Getting Started
- Install the Gneiss extension.
- Configure your API key (see Configuration below).
- Open a document, press
Cmd+[ to create a blank, type an instruction, and click Fill.
Usage
Creating Blanks
Press Cmd+[ to insert a blank at your cursor. A ⟨⟩ pair appears and you can type a natural-language instruction inside it.
For example, you might write:
The seminal result by ⟨citation⟩ introduced the transformer architecture, which ⟨one-sentence summary of key contribution⟩.
You can also select text first, then press Cmd+[ — the selection becomes the instruction.
Blanks are directly editable: click into the ⟨instruction⟩ text to modify it at any time. CodeLens actions (Fill and Delete) appear above lines with pending blanks.
Filling Blanks
Click the Fill CodeLens above any blank, or use one of the batch fill modes:
| Mode |
Shortcut |
Best for |
| Single |
Click "Fill" CodeLens |
One blank at a time |
| Complete |
Option+A |
All blanks in one API call — most token-efficient |
| Sequential |
Option+S |
Blanks that depend on earlier ones (fills top-to-bottom in a single conversation) |
| Parallel |
Option+D |
Independent blanks — fastest wall-clock time |
Each fill produces an inline diff. Press Cmd+Y to accept or Cmd+N to reject.
References
Type @ inside a blank to reference files from your workspace. The agent receives their content as context.
| Syntax |
What it does |
@path/to/file.tex |
Include the file's content in the agent's context |
@papers/ |
Browse files in a directory |
@*.bib |
Glob-match files |
@references.bib:nanda2023 |
Include a specific BibTeX entry from a .bib file |
@cite:nanda2023 |
Find a cite key across all .bib files in the workspace |
@"path with spaces/refs.bib":key |
Quoted path syntax for paths containing spaces |
Autocomplete suggestions appear as you type, with icons distinguishing file types. For .bib files, typing : after the filename triggers cite key completions showing key, title, author, and year.
@model: Override
Override the LLM provider and model for individual blanks:
⟨@model:claude:claude-sonnet-4-5 synthesise the argument from sections 2 and 3⟩
⟨@model:ollama:llama3:8b year of publication⟩
This lets you mix models within a document — e.g. a fast/cheap model for simple lookups and a more capable model for complex synthesis. Blanks without @model: use your global default. The directive is stripped before reaching the model.
Autocomplete helps here too: typing @model: suggests providers (claude, ollama), and typing further (e.g. @model:claude:) suggests available models.
The filling agent has access to tools it can use autonomously:
- Web search — the agent can search the web for facts, citations, and context (Claude uses native web search; Ollama requires an API key from
ollama.com/settings/keys)
**read_file** — read any file in your workspace (text files, .bib, and PDFs)
**search_files** — grep-style search across workspace files
**list_directory** — browse directories
You don't need to @-reference everything — the agent can explore your workspace on its own. But explicit @ references are more reliable and avoid extra tool-call latency.
File Access Control
Create a .gneissignore file in your workspace root to exclude files from agent access. Uses regex patterns, one per line:
# Ignore build artifacts
build/.*
dist/.*
# Ignore drafts
drafts/.*\.tex
Files matching .gitignore patterns are also excluded by default (controlled by the gneiss.repo.respectGitignore setting).
Keybindings
All keybindings are remappable via VSCode's standard keybindings.json.
| Action |
Binding |
| Open blank at cursor / wrap selection |
Cmd+[ |
| Cancel blank |
Escape |
| Fill all blanks (complete) |
Option+A |
| Fill all blanks (sequential) |
Option+S |
| Fill all blanks (parallel) |
Option+D |
| Accept inline diff |
Cmd+Y |
| Reject inline diff |
Cmd+N |
| Open chat panel |
Cmd+L |
Note: Cmd+[ overrides VSCode's "Outdent Line" — use Shift+Tab for outdenting instead.
Configuration
Claude
Your API key is detected automatically from the ANTHROPIC_API_KEY environment variable, or set it in settings: gneiss.llm.claude.apiKey.
| Setting |
Default |
Description |
gneiss.llm.claude.model |
claude-haiku-4-5 |
Model to use |
gneiss.llm.claude.maxTokens |
64000 |
Max output tokens per fill. When extended thinking is enabled, the 5k thinking budget is carved out of this value. Model limits: haiku/sonnet/opus-4-5 = 64k, opus-4-6 = 128k |
Ollama
| Setting |
Default |
Description |
gneiss.llm.ollama.model |
qwen3:4b |
Model to use (must be pulled locally) |
gneiss.llm.ollama.baseUrl |
http://localhost:11434 |
Ollama server URL |
gneiss.llm.ollama.apiKey |
|
API key for web search (from ollama.com/settings/keys) — not needed for local inference |
gneiss.llm.ollama.keepAlive |
60m |
How long to keep model loaded. Use -1 for infinite |
gneiss.llm.ollama.numCtx |
32768 |
Context window size. Must be consistent across requests for KV cache reuse |
gneiss.llm.ollama.maxTokens |
-1 |
Max output tokens (-1 = unlimited) |
General
| Setting |
Default |
Description |
gneiss.llm.defaultProvider |
claude |
claude or ollama |
gneiss.agents.maxConcurrent |
3 |
Max parallel fill agents |
gneiss.agents.groupingDistance |
30 |
Max lines apart to group nearby blanks into one API call |
gneiss.agents.fillOnClose |
false |
Auto-fill blanks immediately when instruction is entered |
gneiss.agents.skipThinking |
false |
Disable extended thinking to reduce latency |
gneiss.repo.respectGitignore |
true |
Exclude .gitignore-matched files from agent access |