Parquet Manager

VS Code / Cursor extension to preview, query, and edit Apache Parquet files.

Features

Parquet Preview custom editor — open .parquet files in a paginated table view, with a collapsible schema sidebar and SQL panel
Pagination — browse large files without loading everything into memory (50 / 100 / 250 / 500 rows per page)
SQL filter (DuckDB) — run SELECT / WITH … SELECT queries against the open file through the parquet_data view; results render in the same table; Ctrl+Enter to execute
Column visibility — toggle individual columns on/off from the Schema panel checkboxes (browse mode)
Inline row editing — edit any row in place with type-aware controls (text / number / boolean dropdown / date / datetime / time pickers) and save back into the parquet file
Safe writes — every save goes through a temp file, atomic swap, retry-on-EPERM, and pre-flight CAST validation per column so a bad value never corrupts the original file
Codec preservation — the saved file keeps the original compression codec (SNAPPY, GZIP, ZSTD, BROTLI, LZ4, LZ4_RAW, or uncompressed)
Schema — column names and types in the sidebar plus a Parquet: Show Schema command to dump the schema as text
Export — CSV or JSON via the Explorer context menu, Command Palette, or editor title menu
Theme-aware UI — inputs, selects, and native pickers follow the VS Code / Cursor color theme (dark / light / high-contrast)

Compression codecs (GZip, ZSTD, Brotli, LZ4, etc.) on read are supported via hyparquet-compressors. On write, DuckDB writes the parquet output.

Development

npm install
npm run compile

Press F5 in VS Code or Cursor to launch an Extension Development Host, then open a .parquet file.

Usage

Open any .parquet file — the Parquet Preview editor opens by default.
Use Parquet: Show Schema, Export to CSV, or Export to JSON from the Command Palette or Explorer right-click menu.
In the SQL panel, query the parquet_data view (columns match your file). Press Ctrl+Enter or Run to execute. Only SELECT / WITH … SELECT queries are allowed.
Use the Schema sidebar checkboxes to hide/show columns in browse mode.
Click the pencil icon on any row to edit it; save with the check icon or cancel with the X.

DuckDB query examples

Filter rows with WHERE (replace column names with yours):

-- Equality and simple filters
SELECT * FROM parquet_data
WHERE status = 'active'
LIMIT 100;

SELECT * FROM parquet_data
WHERE country IN ('US', 'CA', 'GB')
  AND created_at >= '2024-01-01';

-- Numeric range
SELECT id, amount, category FROM parquet_data
WHERE amount BETWEEN 10 AND 500
  AND category IS NOT NULL;

-- Text search
SELECT * FROM parquet_data
WHERE name ILIKE '%smith%'
   OR email LIKE '%@example.com';

-- Combine conditions
SELECT * FROM parquet_data
WHERE (region = 'EU' OR region = 'UK')
  AND revenue > 1000
  AND is_deleted = false
ORDER BY revenue DESC
LIMIT 50;

-- Aggregate with a filter
SELECT category, COUNT(*) AS n, AVG(amount) AS avg_amount
FROM parquet_data
WHERE event_date >= '2025-01-01'
GROUP BY category
HAVING COUNT(*) > 10
ORDER BY n DESC;

Editing rows

Click the pencil icon (✎) on a row. Each cell becomes the input control that matches its DuckDB column type:
- BOOLEAN → dropdown with (null) / true / false
- Integer types (TINYINT … HUGEINT signed and unsigned) → <input type="number" step="1">
- Float / DECIMAL types → <input type="number" step="any">
- DATE → date picker
- TIMESTAMP* → datetime-local picker
- TIME / TIME WITH TIME ZONE → time picker
- VARCHAR / TEXT / UUID → text input
- BLOB / LIST / STRUCT / MAP / ARRAY / UNION → read-only (not editable)
Empty input means NULL (and the boolean (null) option works the same).
Click ✓ to save; the value is CAST to the original column type. If the cast fails (e.g., "abc" → BIGINT) you'll see an error and the file is not modified.
Click ✗ to cancel and restore the original cell text.

How writes stay safe

The new row is computed with a DuckDB COPY that reads the original file, replaces only the changed cells using the row index, and casts each new value to the original column type.
The result is written to a temp file (<name>.parquet.parquet-manager.tmp) using the same compression codec as the original.
DuckDB releases the original, then the temp is renamed over the original. On Windows EPERM/EBUSY (antivirus, OneDrive, another viewer), the rename is retried with exponential backoff and falls back to copyFile + unlink.
If any step fails, the temp file is deleted and the original parquet is left untouched.

Tech

hyparquet for parquet reading and metadata
hyparquet-compressors for compression codecs on read
@duckdb/duckdb-wasm (Node blocking build) running in the extension host for SQL queries and writes
VS Code Custom Editor API (works in Cursor with the same extension model)

Parquet Manager

Andrea Dutto

Parquet Manager

Features

Development

Usage

DuckDB query examples

Editing rows

How writes stay safe

Tech