ECIT Fabric Development

VS Code extension for developing Microsoft Fabric notebooks with lakehouse exploration. All execution runs on Fabric via Livy API.

Note: This extension was built for ECIT Data & AI's internal Fabric development workflow. It requires specific workspace structure and Azure Service Principal setup to work and is not intended for public use.

What It Does

Lakehouse Explorer - Browse schemas, tables, columns, and files from Fabric lakehouses
Livy Spark Execution - Run notebooks and SQL on Fabric via Livy Sessions API directly from VS Code
SQL Intellisense - Auto-completion for table/column names in spark.sql() and .sparksql files
DataFrame Viewer - View query results in a panel (like SSMS results grid)
Notebook Development - Open, edit, and run Fabric notebooks as VS Code notebooks via Livy
Capacity Management - Monitor, resume, and pause Fabric capacity from the explorer

Requirements

Detailed setup guide is in the docs section (in Danish). Contact author for details.

Software

Requirement	Version	Notes
Python	3.11	For Pylance intellisense

Python Packages

python -m pip install pyspark==3.5.1 requests pytz pandas openpyxl

These are only for Pylance intellisense (type checking in VS Code). All execution runs on Fabric via Livy — no local Spark, Java, or Hadoop needed.

Workspace Structure

The extension requires a local_development folder in your workspace, which the extension creates automatically:

your-repo/
├── local_development/                    # Auto-created by extension
│   ├── active_connection.json            # Tracks active connection
│   └── connections/
│       └── {uuid}/
│           └── schema_reference.json     # Table/column metadata for intellisense
├── utility/                              # Shared Python modules (.py files)
│   ├── nb_dataplatform_functions.py
│   └── nb_extract_bc_functions.py
└── workspaces/
    └── orchestration/
        └── notebooks/                    # Fabric .Notebook folders

Azure Service Principal

Each lakehouse connection requires a Service Principal with access to OneLake:

Create an App Registration in Azure Entra ID
Grant permissions to the Fabric workspace (Contributor or higher)
Create a client secret and note the values:
- Tenant ID
- Client ID (Application ID)
- Client Secret

When adding a connection in the extension, you'll enter these credentials. The client secret is stored securely in VS Code's secret storage.

If the same service principal is used between environments, then connections can be switched without restarting Spark.

Notebooks Format

Notebooks should be Python files (.py) in Fabric's format with cell markers:

# Fabric notebook source

# METADATA ********************
# META {
#   "kernel_info": { "name": "synapse_pyspark" },
#   "dependencies": {}
# }

# CELL ********************
df = spark.sql("SELECT * FROM bronze.customer")
display(df)

Utility Module Resolution

Shared function modules (e.g., nb_dataplatform_functions) are stored as .py files in the customer's utility/ folder. At Livy execution time, the extension inlines these modules so the code can run on Fabric without access to local files.

Notebooks import them via:

from nb_dataplatform_functions import *

Dependencies are resolved recursively (e.g., nb_extract_bc_functions → nb_dataplatform_functions).

Key Vault integration:

If the connection has keyVaultName set, the extension injects KEY_VAULT_NAME via os.environ at Livy session creation
The get_key_vault_name() function checks os.environ first, then falls back to module variable
In Fabric portal, %run nb_dataplatform_config sets KEY_VAULT_NAME instead

Getting Started

Open the ECIT Fabric sidebar (data lake icon in activity bar)
Click Add Connection (+) and enter:
- Connection name
- Workspace ID and Lakehouse ID (from Fabric portal URL)
- Tenant ID, Client ID, Client Secret (can be retrieved from Key Vault)
- Key Vault name (for get_secret() function and if KEY_VAULT_NAME needs replacement)
Right-click connection → Rebuild intellisense to fetch table schemas
Click Spark in the status bar (lower left) to start a Livy session (~30 seconds)

Key Commands

Command	Keybinding	Description
New Spark SQL Query	`Ctrl+N`	Open new `.sparksql` tab
Execute Spark SQL	`F5`	Execute `.sparksql` file
Run Selection	`Shift+Enter`	Execute selected Python code
Run Cell	`Ctrl+Enter`	Execute current cell
Preview Table	`Ctrl+3`	Preview table in cursor
Reload Schemas	`Ctrl+Shift+R`	Refresh intellisense from OneLake

Features

Spark SQL Files (.sparksql)

Standalone SQL files with SSMS-like experience:

Syntax highlighting for Spark SQL
Intellisense for tables, columns, and functions
Press F5 to execute and see results in DataFrame Viewer
Multiple statements separated by semicolons (;)

SQL Cells in Notebooks (%%sql)

Write SQL directly in notebook cells using the %%sql magic, just like in Fabric and Databricks:

%%sql
SELECT
    a.customer_id
    , a.name
FROM bronze.customer a
WHERE a.active = true

How it works:

Type %%sql on the first line of any cell → language switches to Spark SQL
Full SQL syntax highlighting and intellisense
On execution, SQL is wrapped in spark.sql() automatically
On save, cell is serialized with # MAGIC %%sql and "language": "sparksql" metadata

Under the hood: %%sql is syntactic sugar — the SQL is wrapped in spark.sql("""...""") before execution, with table references translated to ABFSS paths.

Files Browser

Browse, upload, download, and manage files in OneLake Files section:

Right-click to upload, download, rename, or delete
"Open with read code" generates a ready-to-run code snippet

Changelog

1.5.0

Schema compare progress bar: replaces spinner with an animated progress bar showing elapsed time and estimated completion — uses rolling average of last 5 durations for the estimate, fills to 100% before transitioning to results
Extension state persistence: new extension_state.json in local_development/ stores cross-session UX data (e.g., schema compare durations) via getExtensionState()/updateExtensionState() utilities

1.4.9

Schema compare: right-click any connection in Lakehouse Explorer to compare schemas, tables, columns, and data types between two lakehouses — side-by-side webview with collapsible schemas/tables, color-coded diffs, and "differences only" filter
Schema compare updates both connections' schema_reference.json as a side effect, keeping intellisense fresh
No Spark session required — uses Unity Catalog REST API directly

1.4.8

Output channel logging: all Livy session lifecycle events (create, poll, execute, delete, keep-alive, auth) now log to "LakehouseStudio: Livy (Spark session)" output channel for debugging
Re-added FABRIC_WORKSPACE_ID and FABRIC_LAKEHOUSE_ID environment variables — injected at session creation and re-injected on connection switch, enabling abfss://{FABRIC_WORKSPACE_ID}@onelake.dfs.fabric.microsoft.com/{FABRIC_LAKEHOUSE_ID}/ path construction in notebooks

1.4.7

F5 Quick SQL Preview now works in notebook cells (was broken due to keybinding not propagating selection in notebook context)
Notebook tree icons reordered: pencil (edit) first, then play (queue)
First-query personalized greeting stays for 6 seconds (was 4)
Removed local Spark dependencies: Java, Hadoop, and 11 pip packages no longer required — all execution via Livy API
Updated Setup Guide and README for Livy-only architecture

1.4.6

Cancel execution: press Escape or click the Stop button (◼) in the DataFrame Viewer title bar to cancel a running Livy query — aborts the poll loop immediately and sends a best-effort cancel to Fabric Livy
Notebook cell cancel: clicking the native VS Code Stop button on a cell now cancels the Livy statement
Loading screen timer: elapsed time shown as mm:ss next to the status message while a query is running
Loading screen messages now fade in/out between transitions instead of typewriter animation, and each message stays for 4 seconds
Removed Session 404 log noise on Livy session startup (expected transient state, now silent)
Removed legacy .fabpy era cleanup code (dead code from v1.3.x)

1.4.5

NULL vs empty string in DataFrame Viewer: real NULLs display as styled "NULL", empty strings display as blank cells (previously both showed as NULL)
Kernel selector now only auto-binds to an idle session from the active connection — sessions from other connections (e.g. PRD when DEV is active) are no longer auto-selected
Kernel affinity updates automatically when a session becomes idle or when switching connections
StructType mismatch warning: white text only (no yellow colour or ! badge) — the tree item description "structtype differs from table" is the sole visual indicator
StructType check now runs on notebook save only, and clears on connection switch (no longer triggered by schema refreshes)
Fixed loading spinner persisting on .sparksql tabs after DDL-only queries complete
Fixed DataFrame Viewer resetting to "Waiting for results" when clicking inside notebook cells

1.4.4

Row count toggle: panel title bar icon to include .count() in queries — shows "Showing X of Y rows" when enabled, no separate async count request
Semicolons-only statement splitting: blank lines no longer split statements in .sparksql files — only ; separates statements (blank lines are cosmetic)
Leading block comments (/* ... */) in .sparksql files no longer prevent statement classification
Fixed stuck loading screen: 5-minute safety timeout clears stale loading state, try/catch guards on all execution paths

1.4.2

Removed local Spark and uses Livy sessions exclusively
Multiple concurrent Livy sessions: run DEV and PRD sessions simultaneously — session picker shows when clicking the status bar with more than one active session
[DEV]/[PRD] environment labels in status bar (● Spark [DEV]), notifications, and session picker — derived from connection name
Session counter resets to 0 when the last session ends (no more ever-incrementing numbers)
"Count all" on-demand button: queries now use limit(1000) with no upfront .count() — counting is optional and fires only when you click the button. Shows "Showing all X rows" when fewer than 1,000 rows returned
Per-tab DataFrame Viewer state: loading screen and results are tracked per .sparksql file — switching tabs restores the correct view (loading / results / empty) for each file
Patience messages: loading screen cycles through patience messages (including a personalized one with your username) for long-running queries
PRD session restore: switching back to a connection with a still-starting session now shows the spinner and resumes waiting correctly
All sessions (including suspended background sessions) are killed when VS Code closes

1.4.1

JSON-first Lakehouse Explorer: tree view now reads schemas, tables, and columns instantly from local schema_reference.json instead of calling the OneLake API on every expand
Background sync: once per connection per session, silently syncs new/deleted tables from OneLake when the explorer panel becomes visible
Column types stored in schema_reference.json ({name, type} objects instead of just column names) — tree view displays data types from local cache
Auto-rebuild on upgrade: old-format schema_reference.json is detected and rebuilt automatically on first tree expand
Retry on failure: if auto-rebuild fails (e.g. capacity paused), re-expanding the tree retries. "Refresh Tables" also triggers rebuild when old format is detected

1.4.0

Redesigned DataFrame loading screen: modern skeleton grid with diagonal pulse animation and typewriter status messages
Personalized first-query greeting using developer name (auto-detected from OS username)
Refresh commands (schema, table, tables) now available in right-click context menu (in addition to existing inline icons)
"Delete schema" only appears in context menu when schema has no tables (safety guard)
Dynamic WebSocket port: multiple VS Code sessions no longer conflict — each session finds its own port automatically
Block cursor animation on typewriter text

1.3.9

SQL translation now works for single-line spark.sql("...") and spark.sql('...') strings (previously only triple-quoted strings were translated)
Clicking refresh on "Tables" or a schema (e.g. "Bronze") in Lakehouse Explorer now syncs new/deleted tables from OneLake (no need for Ctrl+Shift+R)
DataFrame Viewer "Copy" button now copies plain text (tab-separated) instead of HTML with formatting
Fixed cell type reverting on save when changing a cell's language in .fabpy notebooks (e.g. SQL to Python)
Removed temp view conversion feature (CodeLens, code actions, Ctrl+Shift+C)

1.3.8

Progressive DataFrame loading: preview rows appear instantly while full dataset loads in background
Cache tables toast now shows per-table progress (e.g. "Caching bronze.customer...")
Simplified workspace imports: removed dead SQL translation from import generation, renamed folder to workspace_imports with auto-migration
Simplified caching to just caching "Large tables > 1M rows" and also exclude some tables from caching with a True/False flag

1.3.7

Fixed too aggressive semicolon-insertions on sparksql tabs to terminate previous statements

1.3.6

Added environment field (Dev/Prd) to lakehouse connections with auto-detection from connection name
On activation, silently auto-switches away from production to first available dev connection
One-time modal warning per session when manually switching to a production connection
Tree view shows [DEV]/[PRD] labels with orange text for active production connections and hover tooltip
Stale .fabpy notebook tabs from previous VS Code session are automatically closed on startup (survives reload)
Stale session guard: after 6+ hours of inactivity, prompts to pull latest changes and reload for extension updates
SQL intellisense now works for UPDATE, MERGE INTO, INSERT INTO, and DELETE FROM statements (schema, table, and column completions)
Auto-inserts semicolons in .sparksql files when a blank line is left for 2+ seconds after a statement

1.3.5

Added support for pure Python kernel notebooks (# %% [python] cell marker) with correct jupyter_python metadata
Fixed dirty indicator on tab and tree view so cell output no longer triggers unsaved state
StructType mismatch tooltip now shows which columns differ (in notebook only / in schema only)
Added workspace root folder guard with warning message and tree view guidance when opened from subfolder
Fixed diagnostics (red squiggles) firing inside SQL comments (-- and /* */)
"Open with Read Code" now creates .fabpy notebook tabs with separate import and code cells
Updated spinner for DDL statements

1.3.4

Added abfss translation to spark.sql inside functions notebooks
Removed strict check so schema.table always resolve to abfss even if table does not yet exists in intellisense (but schema has to)

1.3.3

Fixed CTRL+3 shortcut
Ensured Fabric Spark (Livy sessions) are reused when executing multiple notebooks
Changed the bootstrap notebook concept to just run them on the fly when they occur in notebooks (%run ...)

1.3.2

Simplified resolving og KEY_VAULT_NAME and BASE_PATH to function calls with environment variables

1.3.1

Fixed bug where BASE_PATH got overridden by our nb_dataplatform_functions during notebook runs

1.3.0

Typewriter loading animation to reduce perceived query wait time

1.2.9

Make cell execution timeout configurable (was 10 min hardcoded before)
Fixed table version history bug
Added "interrupt kernel" functionality by clicking on the spinning Spark in left lower corner
Added "Loading data..." text instead of timer when timer stops and data are loading in the view

1.2.8

Removed pipeline explorer
Added folder support for notebooks

1.2.7

Fixed %%sql cells so they deserialize correctly back to Fabric format

1.2.6

Integrated bootstrap functions notebooks in extension with KEY_VAULT_NAME injection
Ensure kernel is closed/disposed if we close VS Code
Minor fixes to intellisense refresh and bug fixes
Simplified query execution graphic and added query timer (mm:ss)
Added %%sql cell support
%run statements get resolved by dynamically looking up the notebook locally
Removed option to run notebooks remote because we now have Livy sessions (fabric kernel)
Added multiple sequential notebook runs with status/progress bar and run order

1.2.5

Added dependencies for ZeroMQ (cmake-ts)

1.2.4

Fixed ZeroMQ external issue

1.2.3

Fabric Local Kernel: Self-managed ipykernel replaces Jupyter extension dependency
- No more idle timeout (kernel stays alive until you stop it)
- Start/stop Spark via status bar click (lower left corner)
- No Jupyter VS Code extension required
Better intellisense refresh on column changes to existing tables (code action and refresh when tree opens)

1.2.2

Fixed F5 (run selected SQL) does not send query directly to kernel

1.2.1

Fixed SHIFT+ENTER selection run against kernel

1.2.0

Replaced cell python with a .fabpy format which uses VS Codes notebook API like Jupyter to provide a real notebook experience
Added direct kernel access to interactive window (Spark session)
Added error showing in dataframe results for errors in sparksql files
Added remote execution to Livy (spark) sessions while following cell executions

1.1.5

Improved intellisense in multiple SELECT statements on same editor
Remind user of semicolon SELECT termination

1.1.4

Increased buffer to minimize flicker when scrolling fast in query results
Removed parameter passing from remote notebook runnings (wasn't used)
Fixed warnings for bk_ columns in notebook tree view

1.1.3

Fixed jar path when user has danish special characters in name

1.1.2

Fixed abfss translation when tables has backticks beforehand

1.1.1

Adjusted connection screen/view in light mode

1.1.0

Automatic pull from remote repos when there is a repos open with connections
Re-cache tables (UNCACHE + CACHE)

1.0.9

Copy SQL button didn't work when sql had f-strings (spark.sql(f"""..."""))
Ensured when copying SQL that the original .py file has it selected for easy copying back after editing

1.0.8

Added translation to abfss// paths inside python files
Fixed BASE_PATH so we write to the correct connections lakehouse from local notebooks

1.0.7

Delete schema, Delete table and Create schema commands added
Fixed some places where schema.table wasn't properly translated to abfss://

1.0.6

Adjusted NULL background color in dark theme
Added more decimals to webview (data results)

1.0.5

Increased visibility on cell markers in light theme on Python cells
Added more checks to notebooks to ensure schema, table and business keys are defined correctly

1.0.4

Fixed runtime translation of spark(f"""...""") strings when parameters are tables we need to resolve before running the code

1.0.3

Changed so semicolon terminates statements in Spark SQL tabs

1.0.2

Fixed selection that couldn't be seen on the black background in active cells
Added examples for spark functions with tooltips
Added "Delete cell" option in Python cells

1.0.0

Fabric capacity state and resume/pause
Allow CRUD operations directly in Spark SQL cells
Adjusted intellisense so it automatically rebuilds it all if first time refresh
Added key vault retrieval of values for setting up connections

0.9.9

Removed local metastore - SQL is now translated to ABFSS paths at runtime
Simpler setup (no metastore sync step needed)
Table caching now uses ABFSS paths directly
Connection switching no longer restarts Spark (same service principal)
Play button moved to title bar; plug icon switches connections

0.9.8

Fixed so worker and driver works on same Python
Deletes eventually misplaced local_development folder
Added REFRESH TABLE command when hitting CTRL+SHIFT+R

0.9.7

Autocreates local_development folder (no manual copy around)
When SQL contains UNION ALL, it now correctly look if columns exists on the "right side of the UNION" if tables on both sides are aliased identically

0.9.6

Multi connection support for lakehouses
Bug fixes where the orange color was too dark certain places
This version requires re-setup of connections initially after updating

0.9.4

Added better lineage tracking (right click notebook -> Show lineage) based on name conventions
Adjusted "spinner" graphic to Azure-style (3 dots) when waiting for results
Added LIMIT X handling to display less or more than 1000 rows

0.9.3

VS Code theme-aware data grid (light/dark)
.sparksql file format for pure Spark SQL
Multi-result support for multiple SELECT statements
Built-in SQL syntax highlighting (removed Inline SQL dependency)

0.9.0

Files browser with upload/download operations
Code snippets for reading CSV, Parquet, Excel files

0.8.7

Remote notebook execution via Fabric REST API
Pipeline Explorer with remote execution

0.7.0

Initial release: Lakehouse explorer, intellisense, local notebook development

License

Internal use.

ECIT dataplatform local development

ECIT Data & AI