NLDSL - Domain Specific Languages integrated in Python
This extension provides support for Domain Specific Languages (DSLs) which can be embedded as comments directly into Python code, and are translated to Python during editing.
Currently, included DSLs cover essential functions for Pandas data analysis library and Apache Spark framework for large-scale data processing. TensorFlow and PyTorch support is included, but still in an alpha state.
Detailed information can be found at the project webpage.
It is also possible to create a new DSL with an Excel/tx-template. For further information visit the documentation. These DSLs can be shared with other users where they can use them with the import feature.
Features
- Expansion of DSL expressions into Python during editing (via code completion mechanism)
- Code completion on the DSL level (any Python-level completions still work)
- Creating of ad-hoc DSL-level functions with arguments; these are immediately supported by DSL-code completion
- No dependencies of the generated code (DSL expressions are ignored as comments)
- Largely identical DSLs for Pandas and Apache Spark to facilitate switching between these frameworks
- Alpha support for DSLs for deep learning such as TensorFlow and PyTorch
- Warning that a DSL line and the corresponding code line are out-of-sync
- Column name recommendation for CSV files
- Completing type parameters (e.g. dataframe name, relative path) in DSL lines
- Syntax coloring for DSL lines
- Setting preferences for the order of suggested DSLs, or parts of them
- The possibility to create and share a whole new DSL by using an Excel/tx-template
- More DSLs are coming up: for plotting, R language, etc.
Basic usage
- Type "##" at the beginning of a line and press a code completion key (e.g. ctrl+space for windows). You will get a selection of DSL operators with documentation.
- Chain multiple DSL commands by separating them via "|" (pipe symbol).
- To generate Python code when your chain is ready, use again a code completion key. The first suggested item is the generated code.
- You can integrate your own DSL-level functions (i.e. a sequence of DSL elements) by starting a line with "#$".
- See project webpage for more information.
Examples
- Selecting columns 'Country/Region' and 'Confirmed' of the dataframe 'data', and printing the first 10 rows
- Grouping rows of the dataframe 'data' by the column 'Country/Region', get the first 10 rows and then sorting by 'Confirmed'
- Appending to dataframe 'data' a new column 'Active' which is derived from 'Confirmed', 'Deaths' and 'Recovered'
- Setting source of information for column name recommendation.
In which:
- name: the name of the dataframe, this should be consistent with the name of the dataframe in the code
- type: the type of the data file. Currently, we only support CSV files
- path: the path to the data file
After setting the source information, NLDSL can now suggest column names of the defined dataframe. E.g. selecting column 'Province/State' of dataframe 'data'
- Modifying the DSL or code line leads to a warning that a DSL line and the corresponding code line are out-of-sync. This feature is well-suited for DSLs which generate only one code line for each DSL line (e.g. Pandas and Spark DSLs). Due to the nature of deep-learning DSLs (e.g. TensorFlow and PyTorch DSLs), most of the generated codes have multiple lines and therefore are marked as out-of-sync at the moment.
Advanced usage
Creating a custom DSL
It is possible to create a new custom DSL with an Excel/tx-template. For an in-depth information visit the documentation.
You can find the functionality for the DSL creation and management in the sidebar when you click on the NLDSL logo.
Here you can find the following functionalities:
- Generating an Excel template and a description of tx templates for creating DSLs
- Creating a DSL from an Excel/tx-template
- Importing a DSL to the NLDSL extension from an existing DSL source code
- Removing a DSL from the list of DSLs
To generate an Excel template and to create a DSL, you click on the button Add DSL from Wizard and follow the instructions of the wizard.
When you use the create template functionality, in the output folder you will find an Excel template and a document which describes how to work with .tx template files.
Importing a custom DSL
You can also import an existing DSL (e.g. created by another user) into the NLDSL extension by clicking the +
button in the top right corner of the NLDSL sidebar. Make sure to follow the instructions of the wizard.
For further information visit the documentation
The extension supports Windows, Linux, and macOS. It installs a private Python interpreter for the appropriate platform.
Publications
The concept of the core engine has been described in the paper:
Artur Andrzejak, Kevin Kiefer, Diego Elias Costa, and Oliver Wenz: Agile Construction of Data Science DSLs (Tool Demo), 18th International Conference on Generative Programming: Concepts & Experiences (GPCE 2019)/ACM SPLASH 2019. Paper, Video