Sphinx - the AI copilot for Data

Sphinx integrates directly into your Jupyter notebooks in VS Code, helping you find meaning in data through powerful agentic capabilities.

Using Sphinx is subject to our Terms and Conditions

Activating Sphinx

Open a Jupyter notebook in VS Code
Press Cmd+T or Ctrl+T (Windows/Linux) to activate Sphinx, or click the Sphinx icon in the primary side bar.
The Sphinx panel will appear beside your notebook
If this is your first time, you'll need to log in to your Sphinx account. You can create one at sphinx.ai

Using Sphinx

Chat with Sphinx in the Sphinx panel beside your notebook. You can ask Sphinx to help with any part of your data science process in natural language
Take the wheel yourself -- anything you do in the Jupyter notebook will be reflected back to Sphinx, and it can use your work as a starting point when you ask for something.

Data Science Capabilities

Data Processing

Clean and preprocess data automatically
Handle missing values and outliers
Perform feature scaling and encoding
Create data pipelines
Optimize data processing for large datasets

Exploratory Data Analysis (EDA)

Generate comprehensive statistical summaries
Identify patterns and correlations in your data
Detect outliers and anomalies
Create insightful visualizations automatically
Suggest relevant statistical tests

Data Visualization

Create publication-quality plots using matplotlib, seaborn, or plotly
Generate appropriate visualizations based on data types
Customize visualizations with best practices
Create interactive dashboards
Export visualizations in various formats

Model Building

Suggest appropriate models for your data
Handle data preprocessing and feature engineering
Implement cross-validation and hyperparameter tuning
Generate model evaluation metrics and visualizations
Create model comparison reports

Advanced Features

File Search and References

Use the @ symbol to reference files in your workspace:

@data.csv - Reference a CSV file
@utils.py - Reference a Python module
@requirements.txt - Reference text files

Sphinx can then read and understand the contents of referenced files.

Custom Context Rules

Use the settings button (⚙️ button underneath Sphinx's text box) to open global or local rules for Sphinx.

Local rules only affect notebooks in the same folder, and global rules affect all your notebooks. Local rules will take precendence in case of a conflict.

You can express any preferences or configuration options for Sphinx in natural language or code. For example, you can:

Set your preferred best practices
Define project-specific analysis frameworks
Specify preferred visualization styles
Add custom statistical requirements

Operation Modes

You can select one of two operation modes for Sphinx

Safe Mode: Requires manual approval before executing code
Agent Mode: Allows Sphinx to execute code directly with appropriate safeguards

Controlling Execution

Use the stop button to halt code generation or execution - Sphinx will account for this when planning its next steps, and try to avoid repeating mistakes.
Provide follow-up instructions to refine the output or redirect Sphinx's thought process.
In safe mode, approve or reject generated code to guide Sphinx's decision-making.

Settings

Use the settings button (⚙️ button underneath Sphinx's text box) to open settings for Sphinx. You can use these settings to configure:

Memory Writing: Should Sphinx learn from its mistakes and any idiosyncracies in your data science process?
Memory Reading: Should Sphinx use its learnings to help you in ongoing tasks
Package Installation: Should Sphinx try to pip install packages when needed (you can use Sphinx Rules to let Sphinx know to use other paradigms for packages, such as conda or uv), or defer to you when a needed library is unavailable?

MCP

Sphinx can be configured to utilize Model Context Protocol (MCP) servers. This configuration can be accessed from the "Edit MCP Config" option in the Sphinx options menu.

A typical MCP config has the following format:

{
  "mcpServers": {
    "deepwiki": {
      "description": "MCP tool to get information about public git repos",
      "url": "https://mcp.deepwiki.com/mcp"
    },
    "linear": {
      "description": "Linear MCP server for project management",
      "command": "npx",
      "args": ["-y", "mcp-remote", "https://mcp.linear.app/sse"]
    }
  }
}

Each MCP server entry must specify either a command or a url.

Command-style servers will run the provided command to locally instantiate an MCP server. They support the following additional arguments:

args: Array of command line arguments
cwd: Optional working directory for the command
env: Optional environment variables for the command, provided as a JSON object

For url-style servers, url must be a direct URL to the MCP server. Both HTTP/HTTPS and WebSocket connections are supported.

The description parameter is optional, and lets you tell Sphinx more about the MCP server and when to use it.

Best Practices

Start by telling Sphinx clear objectives, inductive biases and constraints.
Write out nuanced bits of code manually, such as complex data connector configs
Give Sphinx human-like guidance on modelling preferences, runtime/complexity preferences, and style preferences.

If you encounter any issues, or have any suggestions or questions for our team, please get in touch! We'd love to help.