Asset-Aware MCP

🏗️ Asset-Aware ETL for AI Agents - Precise PDF decomposition into structured assets (Tables, Figures, Sections)

🆕 What's New in v0.2.10

Modular Architecture: Server refactored from 2122 → 31 lines (thin entry point)
34 MCP Tools across 5 modules (Document, Section, Job, Knowledge, Table)
12 MCP Resources across 2 modules (Document, Table)
Bug Fixes: use_marker async mode, list_documents filtering, image overlap detection

🌟 Core Concept: Asset-Aware ETL

This extension provides a sophisticated ETL (Extract, Transform, Load) Pipeline for AI Agents. Instead of feeding raw text to an LLM, it decomposes documents into a structured "Map" (Manifest), allowing Agents to precisely retrieve what they need.

The Workflow:

📥 Ingest (ETL): Agent provides a local PDF path.
⚙️ Process: MCP Server reads the file using PyMuPDF, separating Text, Tables, and Figures (with page numbers).
🗺️ Manifest: Generates a structured JSON "Map" of all assets.
📤 Fetch: Agent "looks at the map" and fetches specific objects (e.g., "Table 1" or "Figure 2") as clean Markdown or Base64 images.

✨ Features

📄 Dual-Engine PDF ETL:
- PyMuPDF (default) - Fast extraction (~50MB dependency)
- Marker (optional, use_marker=True) - High-precision with blocks.json containing bbox coordinates
🧭 Section Navigation: Dynamic hierarchy section tree with 4 tools for browsing, searching, and block extraction
🔄 Async Jobs: Track progress for large document batches with Job IDs.
🗺️ Document Manifest: A structured index that lets Agents "see" document structure before reading.
🖼️ Visual Assets: Extract figures as Base64 images for Vision-capable Agents.
📊 A2T (Anything to Table): 19 tools for creating, editing, and exporting professional Excel tables
🧠 Knowledge Graph: Cross-document insights powered by LightRAG.
🔌 MCP Native: Seamless integration with VS Code Copilot Chat and Claude.
🏠 Local-First: Optimized for Ollama (local LLM) but supports OpenAI.

🚀 Quick Start

1. Install Prerequisites

# Install Ollama (for local LLM)
curl -fsSL https://ollama.com/install.sh | sh

# Pull required models
ollama pull qwen2.5:7b
ollama pull nomic-embed-text

2. Install Extension

Open VS Code
Go to Extensions (Ctrl+Shift+X)
Search for "Asset-Aware MCP"
Click Install

3. Run Setup Wizard

Open Command Palette (Ctrl+Shift+P)
Run Asset-Aware MCP: Setup Wizard
Follow the prompts to configure your .env file.

📖 Usage (Agent Flow)

1. Ingest a Document (ETL)

In Copilot Chat, tell the agent to process a file: @workspace Use ingest_documents to process ./papers/study_01.pdf

2. Check Progress

For large files, check the job status: @workspace get_job_status("job_id_here")

3. Inspect the Map

The agent will first look at the manifest to see what's inside: @workspace What tables are available in doc_study_01?

4. Fetch Specific Assets

The agent retrieves exactly what it needs: @workspace Fetch Table 1 from doc_study_01 @workspace Show me Figure 2.1 (the study flow diagram)

⚙️ Configuration

Setting	Default	Description
`assetAwareMcp.llmBackend`	`ollama`	LLM backend (ollama/openai)
`assetAwareMcp.ollamaHost`	`http://localhost:11434`	Ollama URL
`assetAwareMcp.dataDir`	`./data`	Storage for processed assets

🔧 Commands

Command	Description
`Setup Wizard`	Initial configuration & dependency check
`Open Settings Panel`	Visual editor for `.env` settings
`Check Ollama Connection`	Test if local LLM is accessible
`Check System Dependencies`	Verify `uv`, `python`, and `pip` are installed
`Refresh Status`	Update the Status and Documents tree views

🛠️ Troubleshooting & Debugging

If the extension fails to start or the MCP server doesn't appear:

Check VS Code Version: Ensure you are using VS Code 1.96.0 or newer.
Check Dependencies: Run Asset-Aware MCP: Check System Dependencies from the command palette.
Inspect Logs:
- Open Output panel (Ctrl+Shift+U).
- Select Asset-Aware MCP from the dropdown to see extension logs.
- Select Asset-Aware MCP Dependencies to see dependency check results.
Development Mode:
- Clone the repo.
- Open vscode-extension folder.
- Run npm install.
- Press F5 to launch the Extension Development Host.

📚 MCP Tools (34 total)

Document ETL (5)

Tool	Description
`ingest_documents`	Process PDF files into structured assets
`list_documents`	List all ingested documents
`inspect_document_manifest`	View document structure (Tables/Figures/Sections)
`fetch_document_asset`	Get specific Table/Figure/Section content
`parse_pdf_structure`	Parse PDF structure without full ingestion

Tool	Description
`list_section_tree`	Browse document section hierarchy
`get_section_detail`	Get section metadata and stats
`get_section_blocks`	Extract blocks from a section
`search_sections`	Search sections by keyword

Job Management (4)

Tool	Description
`get_job_status`	Track progress of ingestion jobs
`list_jobs`	List all jobs
`cancel_job`	Cancel a running job
`search_source_location`	Find source location in documents

Knowledge Graph (2)

Tool	Description
`consult_knowledge_graph`	Cross-document RAG queries
`export_knowledge_graph`	Export knowledge graph data

A2T - Anything to Table (19)

Tool	Description
`plan_table_schema`	AI-driven schema planning
`create_table_draft`	Start a new draft
`add_rows_to_draft`	Batch add rows to draft
`commit_draft_to_table`	Finalize draft to table
`resume_draft` / `resume_table`	Resume work with minimal context
`create_table` / `add_rows` / `update_row` / `delete_row`	Direct CRUD
`render_table`	Export to Excel with formatting

🔗 Links

📝 License

Apache-2.0

Asset-Aware MCP

Tz Ping Gau

Asset-Aware MCP

🆕 What's New in v0.2.10

🌟 Core Concept: Asset-Aware ETL

The Workflow:

✨ Features

🚀 Quick Start

1. Install Prerequisites

2. Install Extension

3. Run Setup Wizard

📖 Usage (Agent Flow)

1. Ingest a Document (ETL)

2. Check Progress

3. Inspect the Map

4. Fetch Specific Assets

⚙️ Configuration

🔧 Commands

🛠️ Troubleshooting & Debugging

📚 MCP Tools (34 total)

Document ETL (5)

Section Navigation (4)

Job Management (4)

Knowledge Graph (2)

A2T - Anything to Table (19)

🔗 Links

📝 License