SnakeMakerSemi-automatic Snakemake workflow generation from unstructured Bash commands or Python Notebooks.
Table of ContentsSetting up for first timeIn order to work with Snakemaker, it's necessary to connect it to a LLM provider. There are two ways to connect it with some LLM:
Setting up with Github CopilotGithub Copilot allows users to access most state-of-the-art models, and provides APIs that extensions as Snakemaker can use to directly connect to it. Setting up Copilot inside Snakemaker is straightforward: Set up Github Copilot inside VsCodeIn order to work with it, Copilot must be active inside your VSCode.
Set up Copilot models inside SnakemakerOnce Github Copilot is installed and available, it should be visible from the Snakemaker model tabs:
Setting up with external modelsSnakemaker also supports external models via OpenAI-compatible APIs, supported by most vendors (OpenAI, Gemini, Openrouter) and local deploy tools (Ollama). In order to connect Snakemaker to an external LLM:
Get help interactivelySnakemaker provides a custom assistant inside the GitHub Copilot Chat. Tag @snakemaker to access the assistant and get help with the extension. The Notebook feature has a separate assistant, which can be reached with the tag @snakemaker-notebook. ![]() Bash command support![]() Record bash commands historyTurn on/off listening and recording of bash commands manually.
Adding commands manuallyRun Commands importanceSnakemaker distinguishes between important commands, which can contribute to the Snakemake workflow, and unimportant, one-timer commands. Non-important commands will be shown in the Snakemaker panel in a dark-gray color, and by default they are not exported as rules. Importance of a command can be changed manually. ![]() Command detailsSnakemaker extracts details like input/output files and possible rule names. These can be edited manually. ![]() Composite rulesMerge multiple commands into composite commands using drag-and-drop. ![]() GNU Make supportSnakemaker can also generate Make rules. The user can switch between Snakemake and Make rules generation by searching for "Rules output format" in the VSCode settings. Alternatively, ask @snakemaker in the chat to open the setting for you. ![]() Rule generation optionsFor the Snakemake rules, some additional options are offered in the settings:
Settings related to Snakemake rule generation are grouped under "Snakemake Best Practices" in the VSCode settings. ![]() Automatic rule validation and correctionGenerated Snakemake rules are checked for errors, and fed back to the model with the error message for correction.
This feature makes rule generation more reliable, but can slow down the process and consume more tokens of the LLM API. It can be disabled in the settings: In order for automatic correction to work, a path to snakemake must be provided in the settings, or snakemake must be on the user's $PATH.
Snakemaker assistant integration in Copilot ChatThe Snakemaker custom assistant integrated in the GitHub Copilot Chat can assist in many ways during the workflow, providing help and performing operations on behalf of the user. Tag @snakemaker in the Copilot Chat to access the assistant and get help with the extension. The direct chat can be used for a variety of purposes:
![]() ![]() Import-export workspaceThe workspace contains all the recorded commands and their details. By default, the workspace is preserved between VSCode sessions, an option that can be disabled in the settings ("Keep history between sessions"). Explicit import and export of the workspace to a JSON file can be done:
Notebook supportOpen the notebook, select More actions (three dots), and click Process with Snakemaker. ![]() Step 1- Review data dependencies between cellsThe first step in the conversion process involves resolving the data dependencies between the cells of the notebook into discrete, file-based rules. Each cell is characterized by three sets of variables:
From these sets, the dependency graph is constructed by matching each Read variable of a cell with the closest Write of this variable in the previous cells. Wildcards on the other hand will be mathed to patterns in the filenames during the next step, where actual rules will be generated. In the image below, for example, Cell 1 first reads the variable var1, which is received from some previous cell, and then modifies it. Cell 2 also reads var1, but it is the modified version produced by Cell 1. ![]() Snakemaker provides a first automatic resolution of the dependencies, which can be reviewed and further refined by the user. Split, Merge or Remove Cells.Each Cell will become either a Snakemake rule or a script that will be imported by other scripts or rules. The user can split, merge or remove cells to better fit the workflow. ![]() Manually add or remove variables from the Read, Write and Wildcards sets.Dependencies can be added by selecting a variable in the code and clicking on the context buttons that appear. ![]() Similarly, variables can be removed from the sets by clicking on the X buttons displayed next to them under the cell's code. ![]() Set cells as Rules or ScriptsUnder each cell, buttons are provided to set its state as a Rule or a Script, with an additional option Undecided, indicating cells for which a decision hasn't been taken yet. Cells' states are constrained by the states of the cells they depend on. In particular:
Finalize first stepIn order to continue to the next step, all dependencies must be resolved, and all cells must be set either as a Rule or a Script. Unresolved dependencies and undecided cells are shown at the top of the panel. ![]() Step 2- Review generated code and Snakemake rulesIn the second step, Snakemaker uses the information provided in the first step to automatically generate the Snakemake workflow. This process involves code generation.
In this step the user can review the generated code, and make modifications to it. Modify the generated code, auto-propagate changes.The user can modify the generated code on the cells as in a regular editor. Changes are fed to the LLM, which will propagate them to the following cells, if necessary. For example, if the user modifies the suffix of a cell to write the output file with a different format, the LLM will automatically:
Export the workflowWhen the user is satisfied with the generated code, he can export the workflow. A directory where to save the Snakefile and the scripts will be asked. Use the Snakemaker-Notebook assistant to get help and perform large operationsThe process described above can be performed manually using the GUI. However, a more efficient way is to use the Snakemaker-Notebook assistant, which can be reached by tagging @snakemaker-notebook in the Copilot Chat. The assistant can answer prompts regarding the current state of the process, identify issues, provide suggestions to fix them or fix them directly, and perform batch operations from natural language, such as changing the output format of all the rules, or the output directory, set outputs as temp etc. The user is encouraged to engage with the chat agent to get help with the entire process.
Build and install extension for local usage
|