ECIT Fabric Development
VS Code extension for developing Microsoft Fabric notebooks with lakehouse exploration. All execution runs on Fabric via Livy API.
Note: This extension was built for ECIT Data & AI's internal Fabric development workflow. It requires specific workspace structure and Azure Service Principal setup to work and is not intended for public use.
What It Does
- Lakehouse Explorer - Browse schemas, tables, columns, and files from Fabric lakehouses
- Livy Spark Execution - Run notebooks and SQL on Fabric via Livy Sessions API directly from VS Code
- SQL Intellisense - Auto-completion for table/column names in
spark.sql() and .sparksql files
- DataFrame Viewer - View query results in a panel (like SSMS results grid)
- Notebook Development - Open, edit, and run Fabric notebooks as VS Code notebooks via Livy
- Capacity Management - Monitor, resume, and pause Fabric capacity from the explorer
Requirements
Detailed setup guide is in the docs section (in Danish). Contact author for details.
Software
| Requirement |
Version |
Notes |
| Python |
3.11 |
For Pylance intellisense |
Python Packages
python -m pip install pyspark==3.5.1 requests pytz pandas openpyxl
These are only for Pylance intellisense (type checking in VS Code). All execution runs on Fabric via Livy — no local Spark, Java, or Hadoop needed.
Workspace Structure
The extension requires a local_development folder in your workspace, which the extension creates automatically:
your-repo/
├── local_development/ # Auto-created by extension
│ ├── active_connection.json # Tracks active connection
│ └── connections/
│ └── {uuid}/
│ └── schema_reference.json # Table/column metadata for intellisense
├── utility/ # Shared Python modules (.py files)
│ ├── nb_dataplatform_functions.py
│ └── nb_extract_bc_functions.py
└── workspaces/
└── orchestration/
└── notebooks/ # Fabric .Notebook folders
Azure Service Principal
Each lakehouse connection requires a Service Principal with access to OneLake:
- Create an App Registration in Azure Entra ID
- Grant permissions to the Fabric workspace (Contributor or higher)
- Create a client secret and note the values:
- Tenant ID
- Client ID (Application ID)
- Client Secret
When adding a connection in the extension, you'll enter these credentials. The client secret is stored securely in VS Code's secret storage.
If the same service principal is used between environments, then connections can be switched without restarting Spark.
Notebooks should be Python files (.py) in Fabric's format with cell markers:
# Fabric notebook source
# METADATA ********************
# META {
# "kernel_info": { "name": "synapse_pyspark" },
# "dependencies": {}
# }
# CELL ********************
df = spark.sql("SELECT * FROM bronze.customer")
display(df)
Utility Module Resolution
Shared function modules (e.g., nb_dataplatform_functions) are stored as .py files in the customer's utility/ folder. At Livy execution time, the extension inlines these modules so the code can run on Fabric without access to local files.
Notebooks import them via:
from nb_dataplatform_functions import *
Dependencies are resolved recursively (e.g., nb_extract_bc_functions → nb_dataplatform_functions).
Key Vault integration:
- If the connection has
keyVaultName set, the extension injects KEY_VAULT_NAME via os.environ at Livy session creation
- The
get_key_vault_name() function checks os.environ first, then falls back to module variable
- In Fabric portal,
%run nb_dataplatform_config sets KEY_VAULT_NAME instead
Getting Started
- Open the ECIT Fabric sidebar (data lake icon in activity bar)
- Click Add Connection (+) and enter:
- Connection name
- Workspace ID and Lakehouse ID (from Fabric portal URL)
- Tenant ID, Client ID, Client Secret (can be retrieved from Key Vault)
- Key Vault name (for
get_secret() function and if KEY_VAULT_NAME needs replacement)
- Right-click connection → Rebuild intellisense to fetch table schemas
- Click Spark in the status bar (lower left) to start a Livy session (~30 seconds)
Key Commands
| Command |
Keybinding |
Description |
| New Spark SQL Query |
Ctrl+N |
Open new .sparksql tab |
| Execute Spark SQL |
F5 |
Execute .sparksql file |
| Run Selection |
Shift+Enter |
Execute selected Python code |
| Run Cell |
Ctrl+Enter |
Execute current cell |
| Preview Table |
Ctrl+3 |
Preview table in cursor |
| Reload Schemas |
Ctrl+Shift+R |
Refresh intellisense from OneLake |
Features
Spark SQL Files (.sparksql)
Standalone SQL files with SSMS-like experience:
- Syntax highlighting for Spark SQL
- Intellisense for tables, columns, and functions
- Press
F5 to execute and see results in DataFrame Viewer
- Multiple statements separated by semicolons (
;)
SQL Cells in Notebooks (%%sql)
Write SQL directly in notebook cells using the %%sql magic, just like in Fabric and Databricks:
%%sql
SELECT
a.customer_id
, a.name
FROM bronze.customer a
WHERE a.active = true
How it works:
- Type
%%sql on the first line of any cell → language switches to Spark SQL
- Full SQL syntax highlighting and intellisense
- On execution, SQL is wrapped in
spark.sql() automatically
- On save, cell is serialized with
# MAGIC %%sql and "language": "sparksql" metadata
Under the hood: %%sql is syntactic sugar — the SQL is wrapped in spark.sql("""...""") before execution, with table references translated to ABFSS paths.
Files Browser
Browse, upload, download, and manage files in OneLake Files section:
- Right-click to upload, download, rename, or delete
- "Open with read code" generates a ready-to-run code snippet
Changelog
1.5.0
- Schema compare progress bar: replaces spinner with an animated progress bar showing elapsed time and estimated completion — uses rolling average of last 5 durations for the estimate, fills to 100% before transitioning to results
- Extension state persistence: new
extension_state.json in local_development/ stores cross-session UX data (e.g., schema compare durations) via getExtensionState()/updateExtensionState() utilities
1.4.9
- Schema compare: right-click any connection in Lakehouse Explorer to compare schemas, tables, columns, and data types between two lakehouses — side-by-side webview with collapsible schemas/tables, color-coded diffs, and "differences only" filter
- Schema compare updates both connections'
schema_reference.json as a side effect, keeping intellisense fresh
- No Spark session required — uses Unity Catalog REST API directly
1.4.8
- Output channel logging: all Livy session lifecycle events (create, poll, execute, delete, keep-alive, auth) now log to "LakehouseStudio: Livy (Spark session)" output channel for debugging
- Re-added
FABRIC_WORKSPACE_ID and FABRIC_LAKEHOUSE_ID environment variables — injected at session creation and re-injected on connection switch, enabling abfss://{FABRIC_WORKSPACE_ID}@onelake.dfs.fabric.microsoft.com/{FABRIC_LAKEHOUSE_ID}/ path construction in notebooks
1.4.7
- F5 Quick SQL Preview now works in notebook cells (was broken due to keybinding not propagating selection in notebook context)
- Notebook tree icons reordered: pencil (edit) first, then play (queue)
- First-query personalized greeting stays for 6 seconds (was 4)
- Removed local Spark dependencies: Java, Hadoop, and 11 pip packages no longer required — all execution via Livy API
- Updated Setup Guide and README for Livy-only architecture
1.4.6
- Cancel execution: press Escape or click the Stop button (◼) in the DataFrame Viewer title bar to cancel a running Livy query — aborts the poll loop immediately and sends a best-effort cancel to Fabric Livy
- Notebook cell cancel: clicking the native VS Code Stop button on a cell now cancels the Livy statement
- Loading screen timer: elapsed time shown as
mm:ss next to the status message while a query is running
- Loading screen messages now fade in/out between transitions instead of typewriter animation, and each message stays for 4 seconds
- Removed Session 404 log noise on Livy session startup (expected transient state, now silent)
- Removed legacy
.fabpy era cleanup code (dead code from v1.3.x)
1.4.5
- NULL vs empty string in DataFrame Viewer: real NULLs display as styled "NULL", empty strings display as blank cells (previously both showed as NULL)
- Kernel selector now only auto-binds to an idle session from the active connection — sessions from other connections (e.g. PRD when DEV is active) are no longer auto-selected
- Kernel affinity updates automatically when a session becomes idle or when switching connections
- StructType mismatch warning: white text only (no yellow colour or
! badge) — the tree item description "structtype differs from table" is the sole visual indicator
- StructType check now runs on notebook save only, and clears on connection switch (no longer triggered by schema refreshes)
- Fixed loading spinner persisting on
.sparksql tabs after DDL-only queries complete
- Fixed DataFrame Viewer resetting to "Waiting for results" when clicking inside notebook cells
1.4.4
- Row count toggle: panel title bar icon to include
.count() in queries — shows "Showing X of Y rows" when enabled, no separate async count request
- Semicolons-only statement splitting: blank lines no longer split statements in
.sparksql files — only ; separates statements (blank lines are cosmetic)
- Leading block comments (
/* ... */) in .sparksql files no longer prevent statement classification
- Fixed stuck loading screen: 5-minute safety timeout clears stale loading state, try/catch guards on all execution paths
1.4.2
- Removed local Spark and uses Livy sessions exclusively
- Multiple concurrent Livy sessions: run DEV and PRD sessions simultaneously — session picker shows when clicking the status bar with more than one active session
- [DEV]/[PRD] environment labels in status bar (
● Spark [DEV]), notifications, and session picker — derived from connection name
- Session counter resets to 0 when the last session ends (no more ever-incrementing numbers)
- "Count all" on-demand button: queries now use
limit(1000) with no upfront .count() — counting is optional and fires only when you click the button. Shows "Showing all X rows" when fewer than 1,000 rows returned
- Per-tab DataFrame Viewer state: loading screen and results are tracked per
.sparksql file — switching tabs restores the correct view (loading / results / empty) for each file
- Patience messages: loading screen cycles through patience messages (including a personalized one with your username) for long-running queries
- PRD session restore: switching back to a connection with a still-starting session now shows the spinner and resumes waiting correctly
- All sessions (including suspended background sessions) are killed when VS Code closes
1.4.1
- JSON-first Lakehouse Explorer: tree view now reads schemas, tables, and columns instantly from local
schema_reference.json instead of calling the OneLake API on every expand
- Background sync: once per connection per session, silently syncs new/deleted tables from OneLake when the explorer panel becomes visible
- Column types stored in
schema_reference.json ({name, type} objects instead of just column names) — tree view displays data types from local cache
- Auto-rebuild on upgrade: old-format
schema_reference.json is detected and rebuilt automatically on first tree expand
- Retry on failure: if auto-rebuild fails (e.g. capacity paused), re-expanding the tree retries. "Refresh Tables" also triggers rebuild when old format is detected
1.4.0
- Redesigned DataFrame loading screen: modern skeleton grid with diagonal pulse animation and typewriter status messages
- Personalized first-query greeting using developer name (auto-detected from OS username)
- Refresh commands (schema, table, tables) now available in right-click context menu (in addition to existing inline icons)
- "Delete schema" only appears in context menu when schema has no tables (safety guard)
- Dynamic WebSocket port: multiple VS Code sessions no longer conflict — each session finds its own port automatically
- Block cursor animation on typewriter text
1.3.9
- SQL translation now works for single-line
spark.sql("...") and spark.sql('...') strings (previously only triple-quoted strings were translated)
- Clicking refresh on "Tables" or a schema (e.g. "Bronze") in Lakehouse Explorer now syncs new/deleted tables from OneLake (no need for Ctrl+Shift+R)
- DataFrame Viewer "Copy" button now copies plain text (tab-separated) instead of HTML with formatting
- Fixed cell type reverting on save when changing a cell's language in .fabpy notebooks (e.g. SQL to Python)
- Removed temp view conversion feature (CodeLens, code actions, Ctrl+Shift+C)
1.3.8
- Progressive DataFrame loading: preview rows appear instantly while full dataset loads in background
- Cache tables toast now shows per-table progress (e.g. "Caching bronze.customer...")
- Simplified workspace imports: removed dead SQL translation from import generation, renamed folder to
workspace_imports with auto-migration
- Simplified caching to just caching "Large tables > 1M rows" and also exclude some tables from caching with a True/False flag
1.3.7
- Fixed too aggressive semicolon-insertions on sparksql tabs to terminate previous statements
1.3.6
- Added environment field (Dev/Prd) to lakehouse connections with auto-detection from connection name
- On activation, silently auto-switches away from production to first available dev connection
- One-time modal warning per session when manually switching to a production connection
- Tree view shows [DEV]/[PRD] labels with orange text for active production connections and hover tooltip
- Stale .fabpy notebook tabs from previous VS Code session are automatically closed on startup (survives reload)
- Stale session guard: after 6+ hours of inactivity, prompts to pull latest changes and reload for extension updates
- SQL intellisense now works for UPDATE, MERGE INTO, INSERT INTO, and DELETE FROM statements (schema, table, and column completions)
- Auto-inserts semicolons in .sparksql files when a blank line is left for 2+ seconds after a statement
1.3.5
- Added support for pure Python kernel notebooks (
# %% [python] cell marker) with correct jupyter_python metadata
- Fixed dirty indicator on tab and tree view so cell output no longer triggers unsaved state
- StructType mismatch tooltip now shows which columns differ (in notebook only / in schema only)
- Added workspace root folder guard with warning message and tree view guidance when opened from subfolder
- Fixed diagnostics (red squiggles) firing inside SQL comments (
-- and /* */)
- "Open with Read Code" now creates
.fabpy notebook tabs with separate import and code cells
- Updated spinner for DDL statements
1.3.4
- Added abfss translation to spark.sql inside functions notebooks
- Removed strict check so schema.table always resolve to abfss even if table does not yet exists in intellisense (but schema has to)
1.3.3
- Fixed CTRL+3 shortcut
- Ensured Fabric Spark (Livy sessions) are reused when executing multiple notebooks
- Changed the bootstrap notebook concept to just run them on the fly when they occur in notebooks (%run ...)
1.3.2
- Simplified resolving og KEY_VAULT_NAME and BASE_PATH to function calls with environment variables
1.3.1
- Fixed bug where BASE_PATH got overridden by our nb_dataplatform_functions during notebook runs
1.3.0
- Typewriter loading animation to reduce perceived query wait time
1.2.9
- Make cell execution timeout configurable (was 10 min hardcoded before)
- Fixed table version history bug
- Added "interrupt kernel" functionality by clicking on the spinning Spark in left lower corner
- Added "Loading data..." text instead of timer when timer stops and data are loading in the view
1.2.8
- Removed pipeline explorer
- Added folder support for notebooks
1.2.7
- Fixed %%sql cells so they deserialize correctly back to Fabric format
1.2.6
- Integrated bootstrap functions notebooks in extension with KEY_VAULT_NAME injection
- Ensure kernel is closed/disposed if we close VS Code
- Minor fixes to intellisense refresh and bug fixes
- Simplified query execution graphic and added query timer (mm:ss)
- Added %%sql cell support
- %run statements get resolved by dynamically looking up the notebook locally
- Removed option to run notebooks remote because we now have Livy sessions (fabric kernel)
- Added multiple sequential notebook runs with status/progress bar and run order
1.2.5
- Added dependencies for ZeroMQ (cmake-ts)
1.2.4
- Fixed ZeroMQ external issue
1.2.3
- Fabric Local Kernel: Self-managed ipykernel replaces Jupyter extension dependency
- No more idle timeout (kernel stays alive until you stop it)
- Start/stop Spark via status bar click (lower left corner)
- No Jupyter VS Code extension required
- Better intellisense refresh on column changes to existing tables (code action and refresh when tree opens)
1.2.2
- Fixed F5 (run selected SQL) does not send query directly to kernel
1.2.1
- Fixed SHIFT+ENTER selection run against kernel
1.2.0
- Replaced cell python with a .fabpy format which uses VS Codes notebook API like Jupyter to provide a real notebook experience
- Added direct kernel access to interactive window (Spark session)
- Added error showing in dataframe results for errors in sparksql files
- Added remote execution to Livy (spark) sessions while following cell executions
1.1.5
- Improved intellisense in multiple SELECT statements on same editor
- Remind user of semicolon SELECT termination
1.1.4
- Increased buffer to minimize flicker when scrolling fast in query results
- Removed parameter passing from remote notebook runnings (wasn't used)
- Fixed warnings for bk_ columns in notebook tree view
1.1.3
- Fixed jar path when user has danish special characters in name
1.1.2
- Fixed abfss translation when tables has backticks beforehand
1.1.1
- Adjusted connection screen/view in light mode
1.1.0
- Automatic pull from remote repos when there is a repos open with connections
- Re-cache tables (UNCACHE + CACHE)
1.0.9
- Copy SQL button didn't work when sql had f-strings (spark.sql(f"""..."""))
- Ensured when copying SQL that the original .py file has it selected for easy copying back after editing
1.0.8
- Added translation to abfss// paths inside python files
- Fixed BASE_PATH so we write to the correct connections lakehouse from local notebooks
1.0.7
- Delete schema, Delete table and Create schema commands added
- Fixed some places where schema.table wasn't properly translated to abfss://
1.0.6
- Adjusted NULL background color in dark theme
- Added more decimals to webview (data results)
1.0.5
- Increased visibility on cell markers in light theme on Python cells
- Added more checks to notebooks to ensure schema, table and business keys are defined correctly
1.0.4
- Fixed runtime translation of spark(f"""...""") strings when parameters are tables we need to resolve before running the code
1.0.3
- Changed so semicolon terminates statements in Spark SQL tabs
1.0.2
- Fixed selection that couldn't be seen on the black background in active cells
- Added examples for spark functions with tooltips
- Added "Delete cell" option in Python cells
1.0.0
- Fabric capacity state and resume/pause
- Allow CRUD operations directly in Spark SQL cells
- Adjusted intellisense so it automatically rebuilds it all if first time refresh
- Added key vault retrieval of values for setting up connections
0.9.9
- Removed local metastore - SQL is now translated to ABFSS paths at runtime
- Simpler setup (no metastore sync step needed)
- Table caching now uses ABFSS paths directly
- Connection switching no longer restarts Spark (same service principal)
- Play button moved to title bar; plug icon switches connections
0.9.8
- Fixed so worker and driver works on same Python
- Deletes eventually misplaced local_development folder
- Added REFRESH TABLE command when hitting CTRL+SHIFT+R
0.9.7
- Autocreates local_development folder (no manual copy around)
- When SQL contains UNION ALL, it now correctly look if columns exists on the "right side of the UNION" if tables on both sides are aliased identically
0.9.6
- Multi connection support for lakehouses
- Bug fixes where the orange color was too dark certain places
- This version requires re-setup of connections initially after updating
0.9.4
- Added better lineage tracking (right click notebook -> Show lineage) based on name conventions
- Adjusted "spinner" graphic to Azure-style (3 dots) when waiting for results
- Added LIMIT X handling to display less or more than 1000 rows
0.9.3
- VS Code theme-aware data grid (light/dark)
.sparksql file format for pure Spark SQL
- Multi-result support for multiple SELECT statements
- Built-in SQL syntax highlighting (removed Inline SQL dependency)
0.9.0
- Files browser with upload/download operations
- Code snippets for reading CSV, Parquet, Excel files
0.8.7
- Remote notebook execution via Fabric REST API
- Pipeline Explorer with remote execution
0.7.0
- Initial release: Lakehouse explorer, intellisense, local notebook development
License
Internal use.
| |