Scikit-learn Snippets

Important notice: This is a pre-release extension where some of the advertised features are not yet complete, and those features that do exist are subject to change.

This extension is for data scientists writing Python code to create, fit, and evaluate machine learning models using thescikit-learn package. It provides code snippets that work with Python (.py) files or (.ipynb) notebooks to ease the friction of working with scikit-learn. Snippets boost productivity by reducing the amount of Python syntax and function arguments that you need to remember, eliminating keystrokes, and helping to free the data scientist to focus on building useful models from their data.

1. Usage

1.1 Overview

All snippets provided by this extension have triggers prefixed sk, which activate the IntelliSense pop up or allow filtering in the Command Palette. Trigger prefixes are further organised as follows:

Prefix	Description
`sk-setup`	Starting point for importing commonly used modules and setting defaults.
`sk-read`	Read input training data or existing models from file.
`sk-prep`	Preprocess input training data for model fitting.
`sk-regress`	Create and fit regression models.
`sk-classify`	Create and fit classification models.
`sk-cluster`	Create and fit clustering models.
`sk-density`	Create and fit density estimation models.
`sk-embed`	Create and fit dimensionality reduction (embedding) models.
`sk-anomaly`	Create and fit anomaly detection models.
`sk-validation`	Model validation.
`sk-inspect`	Model inspection and explainability.
`sk-io`	Save and restore models on disk.
`sk-args`	Select and adjust model parameters from lists of valid options.

For more information about the organization of the snippets provided by this extension see Section 4.2 for a visual overview of the complete hierarchy.

1.2 Snippets for machine learning workflows

A typical workflow is to:

import commonly used modules with sk-setup,
read training data from file with sk-read,
preprocess training data with sk-prep,
create and train models across a range of machine learning tasks such as:
- regression (sk-regress),
- classification (sk-classify),
- clustering (sk-cluster),
- density estimation (sk-density),
- dimensionality reduction (sk-embed),
- anomaly detection (sk-anomaly)
evaluate fitted models by cross-validation (sk-validation) and inspection (sk-inspect),
save and restore models on disk (sk-io),
optionally adjust the parameters that control model fitting with sk-args.

See the Features section below for a full list of the available snippets and their prefix triggers.

1.3 Inserting snippets

Inserting machine learning code snippets into your Python code is easy. Use either of these methods:

Command Palette

Click inside a Python notebook cell or editor and choose Insert Snippet from the Command Palette.
A list of snippets appears. You can type to filter the list; start typing sk to filter the list for snippets provided by this extension. You can further filter the list by continuing to type the desired prefix.
Choose a snippet from the list and it is inserted into your Python code.
Placeholders indicate the minimum arguments required to train a model. Use the tab key to step through the placeholders.

IntelliSense

Start typing sk in a Python notebook cell or editor.
The IntelliSense pop up will appear. You can further filter the pop up list by continuing to type the desired prefix.
Choose a snippet from the pop up and it is inserted into your Python code.
Placeholders indicate the minimum arguments required to train a model. Use the tab key to step through the placeholders.

You can also trigger IntelliSense by typing Ctrl+Space.

2. Features

2.1 Setup

The following snippets are triggered by sk-setup and provide the starting point for creating models. Usually inserted near the beginning of a code file or notebook, sk-setup is the key snippet for importing modules that are commonly used in the machine learning workflow, and setting defaults that apply to data visualizations.

Snippet	Placeholders	Description
`sk-setup`	`pio.renderers.default` `pio.templates.default`	Provides the initial starting point for creating models. Imports commonly used modules (`pandas`, `numpy`, `json`, `pickle`, `plotly.express`, and `plotly.io`), and sets the default figure renderer and template.

2.2 Read training data

The following snippets are triggered by sk-read and provide features for creating pandas data frames from Comma Separated Value (.csv), Microsoft Excel (.xlsx), Feather (.feather), and Parquet (.parquet) format files. Tabular data stored in pandas data frames is a common source of training data required for fitting scikit-learn models.

Snippet	Placeholders	Description
`sk-read-csv`	`df`, `file`	Read tabular training data from CSV (`.csv`) file (`file`) into pandas data frame (`df`) and report info.
`sk-read-excel`	`df`, `file`	Read tabular training data from Excel (`.xlsx`) file (`file`) into pandas data frame (`df`) and report info.
`sk-read-feather`	`df`, `file`	Read tabular training data from Feather (`.feather`) file (`file`) into pandas data frame (`df`) and report info.
`sk-read-parquet`	`df`, `file`	Read tabular training data from Parquet (`.parquet`) file (`file`) into pandas data frame (`df`) and report info.

2.3 Preprocessing for supervised learning

The following snippets are triggered by sk-prep-target and provide features for preparing tabular training data for supervised learning. The data preparation process involves extracting one or more named features (X) and a named target variable (y) from an input data frame (df), and creating an input training dataset. The data arrays are used to train models for a range of supervised machine learning tasks.

Optionally, the process can also include extracting one or more secondary variables (Z) that are used to better understand the results of model fitting. It is important to note that these secondary variables do not play any role in the model fitting itself, but they can be useful for interpreting the results.

Snippet	Placeholders	Description
`sk-prep-target-features`	`X1`, `X2`, `X3`, `Y` `df`, `X`, `y`	Prepare training data (`X`, `y`) for supervised learning. Training data is identified by a sequence of feature names (`X1`, `X2`, `X3, ...`), and a target name (`Y`), sourced from a data frame (`df`).
`sk-prep-target-features-secondary`	`X1`, `X2`, `X3`, `Y` `Z1`, `Z2`, `Z3`, `df`, `X`, `y`, `Z`	Prepare training data (`X`, `y`) for supervised learning, and prepare secondary data (`Z`) for model evaluation. Training data is identified by a sequence of feature names (`X1`, `X2`, `X3, ...`), and a target name (`Y`), sourced from a data frame (`df`). Secondary data is also identified by a sequence of feature names (`Z1`, `Z2`, `Z3, ...`) sourced from the same data frame (`df`). Note: Secondary data is used only for interpreting model output and plays no role in model training.
`sk-prep-train_test_split`	`X`, `y`, ⚙ `train_size`, ⚙ `random_state`	Randomly split input data (`X`, `y`) into training and test sets using the `train_test_split` function with the supplied parameters (⚙). Holding out a test set from the training process helps to evaluate the performance of supervised learning models.

2.4 Preprocessing for unsupervised learning

The following snippets are triggered by sk-prep-features and provide features for preparing tabular training data for unsupervised learning. The data preparation process involves extracting one or more named features (X) from an input data frame (df), and creating an input training dataset. The data array is used to train models for a range of unsupervised machine learning tasks. In contrast to supervised learning, there is no target variable (y) in unsupervised learning.

Similar to the supervised learning case, the process can optionally include extracting one or more secondary variables (Z) that are used to better understand the results of model fitting. It is important to note that these secondary variables do not play any role in the model fitting itself, but they can be useful for interpreting the results.

Snippet	Placeholders	Description
`sk-prep-features`	`X1`, `X2`, `X3` `df`, `X`	Prepare training data features (`X`) for unsupervised learning. Training data is identified by a sequence of feature names (`X1`, `X2`, `X3, ...`) sourced from a data frame (`df`).
`sk-prep-features-secondary`	`X1`, `X2`, `X3`, `Z1`, `Z2`, `Z3`, `df`, `X`, `Z`	Prepare training data features (`X`) for unsupervised learning, and prepare secondary data (`Z`) for model evaluation. Training data is identified by a sequence of feature names (`X1`, `X2`, `X3, ...`) sourced from a data frame (`df`). Secondary data is also identified by a sequence of feature names (`Z1`, `Z2`, `Z3, ...`) sourced from the same data frame (`df`). Note: Secondary data is used only for interpreting model output and plays no role in model training.

2.5 Regression

2.5.1 Linear regression

The following snippets are triggered by sk-regress-linear and provide features for various types of linear regression ranging from simple ordinary least squares (LinearRegression), regression with a transformed target (TransformedTargetRegressor), regression with transformed features (FunctionTransformer, PolynomialFeatures, SplineTransformer), and regularized models such as ridge regression (Ridge), lasso regression (Lasso), and elastic net regression (ElasticNet).

Snippet	Placeholders	Description
`sk-regress-linear`	`estimator_linear`, ⚙ `fit_intercept`, ⚙ `positive`, `X`, `y`	Linear regression: Create and fit a `LinearRegression` regression model (`estimator_linear`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-transform-target`	`estimator_transform_target`, ⚙ `func`, ⚙ `func_inverse`, ⚙ `fit_intercept`, ⚙ `positive`, ⚙ `check_inverse`, `X`, `y`	Linear regression with transformed target: Create and fit a `TransformedTargetRegressor` regression model (`estimator_transform_target`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-transform`	`estimator_transform`, ⚙ `func`, ⚙ `fit_intercept`, ⚙ `positive`, `X`, `y`	Linear regression with transformed features: Create and fit a `LinearRegression` with `FunctionTransformer` regression model (`estimator_transform`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-polynomial`	`estimator_polynomial`, ⚙ `degree`, ⚙ `fit_intercept`, ⚙ `positive`, `X`, `y`	Polynomial regression: Create and fit a `LinearRegression` with `PolynomialFeatures` regression model (`estimator_polynomial`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-spline`	`estimator_spline`, ⚙ `n_knots`, ⚙ `degree`, ⚙ `knots`, ⚙ `extrapolation`, ⚙ `fit_intercept`, ⚙ `positive`, `X`, `y`	Spline regression: Create and fit a `LinearRegression` with `SplineTransformer` regression model (`estimator_spline`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-pcr`	`estimator_pcr`, ⚙ `n_components`, ⚙ `whiten`, ⚙ `fit_intercept`, ⚙ `positive`, `X`, `y`	Principal component regression (PCR): Create and fit a `LinearRegression` with `PCA` regression model (`estimator_pcr`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-pls`	`estimator_pls`, ⚙ `n_components`, ⚙ `scale`, ⚙ `max_iter`, `X`, `y`	Partial least squares regression (PLS): Create and fit a `PLSRegression` regression model (`estimator_pls`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-ridge`	`estimator_ridge`, ⚙ `alpha`, ⚙ `fit_intercept`, ⚙ `positive`, `X`, `y`	Ridge regression: Create and fit a `Ridge` regression model (`estimator_ridge`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-ridgecv`	`estimator_ridgecv`, ⚙ `alphas`, ⚙ `cv`, ⚙ `fit_intercept`, `X`, `y`	Ridge regression with cross-validation: Create and fit a `RidgeCV` regression model (`estimator_ridgecv`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-lasso`	`estimator_lasso`, ⚙ `alpha`, ⚙ `fit_intercept` ⚙ `positive`, ⚙ `selection`, ⚙ `random_state`, `X`, `y`	Lasso regression: Create and fit a `Lasso` regression model (`estimator_lasso`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-lassocv`	`estimator_lassocv`, ⚙ `alphas`, ⚙ `cv`, ⚙ `fit_intercept`, ⚙ `positive`, ⚙ `selection`, ⚙ `random_state`, `X`, `y`	Lasso regression with cross-validation: Create and fit a `LassoCV` regression model (`estimator_lassocv`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-elasticnet`	`estimator_elasticnet`, ⚙ `alpha`, ⚙ `l1_ratio`, ⚙ `fit_intercept`, ⚙ `positive`, ⚙ `selection`, ⚙ `random_state`, `X`, `y`	ElasticNet regression: Create and fit an `ElasticNet` regression model (`estimator_elasticnet`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-linear-elasticnetcv`	`estimator_elasticnetcv`, ⚙ `l1_ratio`, ⚙ `alphas`, ⚙ `fit_intercept`, ⚙ `positive`, ⚙ `selection`, ⚙ `random_state`, `X`, `y`	ElasticNet regression with cross-validation: Create and fit an `ElasticNetCV` regression model (`estimator_elasticnetcv`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-dummy`	`estimator_dummy`, ⚙ `strategy`, ⚙ `constant`, ⚙ `quantile`, `X`, `y`	Dummy regression: Create and fit a `DummyRegressor` regression model (`estimator_dummy`) with the supplied parameters (⚙) and training data (`X`, `y`).

2.5.2 Ensemble regression

The following snippets are triggered by sk-regress-ensemble and provide features for various types of ensemble regression. Ensemble regression models combine multiple base regression models to create a model that has superior performance compared to a single base model. Rather than relying on a single model's prediction, ensemble methods aggregate predictions from several models to produce a final result that is typically more accurate.

Snippet	Placeholders	Description
`sk-regress-ensemble-random-forest`	`estimator_random_forest`, ⚙ `n_estimators`, ⚙ `criterion`, ⚙ `min_samples_leaf`, ⚙ `random_state`, `X`, `y`	Random forest regression: Create and fit a `RandomForestRegressor` regression model (`estimator_random_forest`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-ensemble-extra-trees`	`estimator_extra_trees`, ⚙ `n_estimators`, ⚙ `criterion`, ⚙ `min_samples_leaf`, ⚙ `random_state`, `X`, `y`	Extremely randomized trees (extra-trees) regression: Create and fit an `ExtraTreesRegressor` regression model (`estimator_extra_trees`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-ensemble-gradient-boosting`	`estimator_gradient_boosting`, ⚙ `n_estimators`, ⚙ `loss`, ⚙ `learning_rate`, ⚙ `min_samples_leaf`, ⚙ `random_state`, `X`, `y`	Gradient boosting regression: Create and fit a `GradientBoostingRegressor` regression model (`estimator_gradient_boosting`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-ensemble-hist-gradient-boosting`	`estimator_hist_gradient_boosting`, ⚙ `loss`, ⚙ `learning_rate`, ⚙ `min_samples_leaf`, ⚙ `random_state`, `X`, `y`	Histogram-based gradient boosting regression: Create and fit a `HistGradientBoostingRegressor` regression model (`estimator_hist_gradient_boosting`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-regress-ensemble-stacking`	`estimator_stacking`, ⚙ `estimators`, ⚙ `final_estimator`, ⚙ `cv`, ⚙ `passthrough`, `X`, `y`	Stack of estimators with a final regressor: Create and fit a `StackingRegressor` regression model (`estimator_stacking`) with the supplied parameters (⚙) and training data (`X`, `y`). Note: By default, all regression estimators in the current scope are collected for stacking.

2.6 Classification

TO BE COMPLETED

The following snippets are triggered by sk-classify and provide features for various types of classification models. Classification is a supervised learning task where the goal is to predict discrete class labels or categories for input data. These models learn from labeled training data and can be used for binary classification (two classes) or multi-class classification problems.

Snippet	Placeholders	Description
`sk-classify-lda`	`n_components`, `whiten`	Linear Discriminant Analysis (LDA): Create and fit a `LinearDiscriminantAnalysis` classification model (`classifier_lda`) with the supplied parameters (⚙) and training data (`X`, `y`).
`sk-classify-qda`	`n_components`, `kernel`, `gamma`, `degree`, `coef0`	Quadratic Discriminant Analysis (QDA): Create and fit a `QuadraticDiscriminantAnalysis` classification model (`classifier_qda`) with the supplied parameters (⚙) and training data (`X`, `y`).

2.7 Clustering

The following snippets are triggered by sk-cluster and provide features for various types of clustering models. Clustering is an unsupervised learning task that groups similar data points together based on their features. These models identify natural groupings within a dataset without requiring labeled examples, making them useful for discovering patterns, segmenting data, and identifying structure in unlabeled datasets.

Snippet	Placeholders	Description
`sk-cluster-kmeans`	`estimator_kmeans`, ⚙ `n_clusters`, ⚙ `init`, ⚙ `n_init`, ⚙ `max_iter`, ⚙ `tol`, ⚙ `random_state`, ⚙ `algorithm`, `X`	K-means clustering: Create and fit a `KMeans` clustering model (`estimator_kmeans`) with the supplied parameters (⚙) and training data (`X`). Includes StandardScaler in pipeline for proper feature scaling.
`sk-cluster-kmeans-minibatch`	`estimator_kmeans_minibatch`, ⚙ `n_clusters`, ⚙ `init`, ⚙ `batch_size`, ⚙ `max_iter`, ⚙ `max_no_improvement`, ⚙ `tol`, ⚙ `random_state`, ⚙ `reassignment_ratio`, `X`	MiniBatch K-means clustering: Create and fit a `MiniBatchKMeans` clustering model (`estimator_kmeans_minibatch`) with the supplied parameters (⚙) and training data (`X`). Optimized for large datasets with StandardScaler in pipeline.
`sk-cluster-meanshift`	`estimator_meanshift`, ⚙ `bandwidth`, ⚙ `seeds`, ⚙ `bin_seeding`, ⚙ `min_bin_freq`, ⚙ `cluster_all`, ⚙ `max_iter`, `X`	Mean shift clustering: Create and fit a `MeanShift` clustering model (`estimator_meanshift`) with the supplied parameters (⚙) and training data (`X`). Includes StandardScaler in pipeline for proper feature scaling.
`sk-cluster-dbscan`	`estimator_dbscan`, ⚙ `eps`, ⚙ `min_samples`, ⚙ `metric`, ⚙ `algorithm`, ⚙ `leaf_size`, `X`	DBSCAN clustering: Create and fit a `DBSCAN` clustering model (`estimator_dbscan`) with the supplied parameters (⚙) and training data (`X`). Includes StandardScaler in pipeline which is critical for distance-based algorithms.
`sk-cluster-hdbscan`	`estimator_hdbscan`, ⚙ `min_cluster_size`, ⚙ `min_samples`, ⚙ `cluster_selection_epsilon`, ⚙ `cluster_selection_method`, ⚙ `alpha`, ⚙ `metric`, ⚙ `alpha`, ⚙ `algorithm`, ⚙ `leaf_size`, `X`	HDBSCAN clustering: Create and fit an `HDBSCAN` clustering model (`estimator_hdbscan`) with the supplied parameters (⚙) and training data (`X`).
`sk-cluster-predict`	`estimator`, `X`	Cluster prediction: Apply a clustering model (`estimator`) to an input dataset (`X`) to predict cluster labels using the model's `predict()` function. Cluster labels are output to a new dataset (`X_estimator_cluster`). Only available for models that support prediction on new data.

2.8 Density estimation

The following snippets are triggered by sk-density and provide features for various types of density estimation models. Density estimation is an unsupervised learning task that creates a model of the probability distribution from which the observed data is drawn. These models can be used to generate new samples, detect outliers, and estimate the likelihood of data points.

Snippet	Placeholders	Description
`sk-density-kernel`	`estimator_kernel_density`, ⚙ `bandwidth`, ⚙ `kernel`, ⚙ `metric`, `X`	Kernel Density Estimation: Create and fit a `KernelDensity` density estimation model (`estimator_kernel_density`) with the supplied parameters (⚙) and training data (`X`).
`sk-density-gaussian-mixture`	`estimator_gaussian_mixture`, ⚙ `n_components`, ⚙ `covariance_type`, ⚙ `init_params`, ⚙ `random_state`, ⚙ `max_iter`, `X`	Gaussian Mixture Model: Create and fit a `GaussianMixture` density estimation model (`estimator_gaussian_mixture`) with the supplied parameters (⚙) and training data (`X`).
`sk-density-sample-kernel`	`estimator`, ⚙ `n_samples`, ⚙ `random_state`,	Sample from Kernel Density model: Generate random samples from a fitted kernel density model (`estimator`) using the `sample()` function with the supplied parameters (⚙). Samples are output to a new dataset (`estimator_samples`).
`sk-density-sample-gaussian-mixture`	`estimator`, ⚙ `n_samples`,	Sample from Gaussian Mixture model: Generate random samples from a fitted Gaussian Mixture density model (`estimator`) using the `sample()` function with the supplied parameters (⚙). Samples are output to new datasets (`estimator_samples`, and `estimator_components`).
`sk-density-score-samples`	`estimator`, `X`	Density score of each sample: Apply a density estimation model (`estimator`) to an input dataset (`X`) to evaluate the log-likelihood of each sample using the model's `score_samples()` function. The log-likelihood is output to a new dataset (`X_estimator_density`), and is normalized to be a probability density, so the value will be low for high-dimensional data.
`sk-density-score`	`estimator`, `X`	Density score: Apply a density estimation model (`estimator`) to an input dataset (`X`) to evaluate the total log-likelihood of the data in `X` using the model's `score()` function. This is normalized to be a probability density, so the value will be low for high-dimensional data.

2.9 Dimensionality reduction

The following snippets are triggered by sk-embed and provide features for various types of dimensionality reduction or embedding. Dimensionality reduction algorithms transform data from a high-dimensional space into a lower-dimensional representation while preserving the most important structure or information. These techniques are useful for visualization, computational efficiency, and removing redundant features.

Snippet	Placeholders	Description
`sk-embed-pca`	`estimator_pca`, ⚙ `n_components`, ⚙ `whiten` `X`	Principal Component Analysis: Create and fit a `PCA` dimensionality reduction model (`estimator_pca`) with the supplied parameters (⚙) and training data (`X`).
`sk-embed-kpca`	`estimator_kpca`, ⚙ `n_components`, ⚙ `kernel`, ⚙ `gamma`, ⚙ `degree`, ⚙ `coef0` `X`	Kernel PCA: Create and fit a `KernelPCA` dimensionality reduction model (`estimator_kpca`) with the supplied parameters (⚙) and training data (`X`).
`sk-embed-lle`	`estimator_lle`, ⚙ `n_components`, ⚙ `n_neighbors`, ⚙ `method` `X`	Locally Linear Embedding: Create and fit a `LocallyLinearEmbedding` dimensionality reduction model (`estimator_lle`) with the supplied parameters (⚙) and training data (`X`).
`sk-embed-isomap`	`estimator_isomap`, ⚙ `n_components`, ⚙ `n_neighbors`, ⚙ `radius`, ⚙ `p` `X`	Isometric Mapping: Create and fit an `Isomap` dimensionality reduction model (`estimator_isomap`) with the supplied parameters (⚙) and training data (`X`).
`sk-embed-mds`	`estimator_mds`, ⚙ `n_components`, ⚙ `metric`, ⚙ `n_init`, ⚙ `random_state`, ⚙ `normalized_stress` `X`	Multidimensional Scaling: Create and fit an `MDS` dimensionality reduction model (`estimator_mds`) with the supplied parameters (⚙) and training data (`X`).
`sk-embed-spectral`	`estimator_spectral`, ⚙ `n_components`, ⚙ `affinity`, ⚙ `gamma`, ⚙ `random_state`, ⚙ `n_neighbors` `X`	Spectral Embedding: Create and fit a `SpectralEmbedding` dimensionality reduction model (`estimator_spectral`) with the supplied parameters (⚙) and training data (`X`).
`sk-embed-tsne`	`estimator_tsne`, ⚙ `n_components`, ⚙ `perplexity`, ⚙ `random_state`, ⚙ `n_iter` `X`	t-Distributed Stochastic Neighbour Embedding: Create and fit a `TSNE` dimensionality reduction model (`estimator_tsne`) with the supplied parameters (⚙) and training data (`X`). Embedding is output to `X_estimator_tsne`.
`sk-embed-transform`	`estimator`, `X`	Embedding transform: Apply a dimensionality reduction model (`estimator`) to an input dataset (`X`) using the model's `transform()` function. Embedding is output to a new dataset (`X_estimator`).

2.10 Anomaly detection

The following snippets are triggered by sk-anomaly and provide features for various types of anomaly detection models. Anomaly detection is the task of identifying rare items, events, or observations that differ significantly from the majority of the data. These models learn patterns in the data and can detect unusual instances that do not conform to expected behavior.

Snippet	Placeholders	Description
`sk-anomaly-one-class-svm`	`estimator_one_class_svm` ⚙ `kernel`, ⚙ `gamma`, ⚙ `nu`, ⚙ `shrinking`, `X`	One-Class Support Vector Machine: Create and fit a `OneClassSVM` anomaly detection model (`estimator_one_class_svm`) with the supplied parameters (⚙) and training data (`X`).
`sk-anomaly-one-class-svm-sgd`	`estimator_one_class_svm_sgd` ⚙ `kernel`, ⚙ `gamma`, ⚙ `n_components`, ⚙ `random_state`, ⚙ `nu`, ⚙ `fit_intercept`, ⚙ `max_iter`, ⚙ `tol`, ⚙ `shuffle`, ⚙ `learning_rate`, ⚙ `eta0`, `X`	One-Class SVM with Stochastic Gradient Descent: Create and fit a `SGDOneClassSVM` with `Nystroem` kernel anomaly detection model (`estimator_one_class_svm_sgd`) with the supplied parameters (⚙) and training data (`X`).
`sk-anomaly-local-outlier-factor`	`estimator_lof` ⚙ `n_neighbors`, ⚙ `algorithm`, ⚙ `leaf_size`, ⚙ `metric`, ⚙ `contamination`, ⚙ `novelty`, `X`	Local Outlier Factor: Create and fit a `LocalOutlierFactor` anomaly detection model (`estimator_lof`) with the supplied parameters (⚙) and training data (`X`).
`sk-anomaly-isolation-forest`	`estimator_isolation_forest` ⚙ `n_estimators`, ⚙ `max_samples`, ⚙ `contamination`, ⚙ `max_features`, ⚙ `bootstrap`, ⚙ `n_jobs`, ⚙ `random_state`, `X`	Isolation Forest: Create and fit an `IsolationForest` anomaly detection model (`estimator_isolation_forest`) with the supplied parameters (⚙) and training data (`X`).
`sk-anomaly-elliptic-envelope`	`estimator_elliptic_envelope` ⚙ `store_precision`, ⚙ `assume_centered`, ⚙ `support_fraction`, ⚙ `contamination`, ⚙ `random_state`, `X`	Elliptic Envelope (Robust Covariance): Create and fit an `EllipticEnvelope` anomaly detection model (`estimator_elliptic_envelope`) with the supplied parameters (⚙) and training data (`X`).
`sk-anomaly-dbscan`	`estimator_dbscan` ⚙ `eps`, ⚙ `min_samples`, ⚙ `metric`, ⚙ `algorithm`, ⚙ `leaf_size`, `X`	DBSCAN: Create and fit a `DBSCAN` anomaly detection model (`estimator_dbscan`) with the supplied parameters (⚙) and training data (`X`).
`sk-anomaly-predict`	`estimator`, `X`	Anomaly prediction: Apply an anomaly detection model (`estimator`) to an input dataset (`X`) to perform outlier classification using the model's `predict()` function. Outlier class is output to a new dataset (`X_estimator_class`) where -1 indicates outliers, and +1 indicates inliers.
`sk-anomaly-score`	`estimator`, `X`	Anomaly score: Apply an anomaly detection model (`estimator`) to an input dataset (`X`) to evaluate the outlier score of each sample using the model's `descision_function()` function. Outlier score is output to a new dataset (`X_estimator_score`) where negative scores indicate outliers, and positive scores indicate inliers.

2.11 Model inspection

The following snippets are triggered by sk-inspect and provide features for inspecting and understanding fitted models. Model inspection tools help data scientists gain insights into how their models make predictions, which features are most important, and how changes in input features affect model outputs. These techniques are essential for explaining model behavior, debugging models, and ensuring that models are behaving as expected.

Snippet	Placeholders	Description
`sk-inspect-partial_dependence`	`estimator`, `X`, ⚙ `features`, ⚙ `percentiles`, ⚙ `grid_resolution`, ⚙ `kind`	Partial dependence: Compute the partial dependence of a model (`estimator`) against an input feature dataset (`X`) using the `partial_dependence` function with the supplied parameters (⚙). Output partial dependence curves are returned as a dictionary (`estimator_partial`).
`sk-inspect-permutation_importance`	`estimator`, `X`, `y`, ⚙ `scoring`, ⚙ `n_repeats`, ⚙ `random_state`, ⚙ `sample_weight`, ⚙ `max_samples`	Permutation importance: Compute the permutation importance of a model (`estimator`) against an input dataset (`X`,`y`) using the `permutation_importance` function with the supplied parameters (⚙). Output feature importance metrics are returned as a dictionary (`estimator_permutation`).

2.12 Model storage

The following snippets are triggered by sk-io and provide features for reading and writing scikit-learn models. The pickle module is used to read and write scikit-learn models because it provides the required binary file i/o capabilities.

Note: There are security risks involved in reading pickle binary files, and users should only read models from trusted sources.

Snippet	Placeholders	Description
`sk-io-read-pickle`	`file`, `estimator`	Read (or deserialize) an existing model (`estimator`) from a binary (`.pickle`) format file (`file`). Report the model type and any fitted parameters.
`sk-io-write-pickle`	`file`, `estimator`	Write (or serialize) a model (`estimator`) to a binary (`.pickle`) format file (`file`).

2.13 Argument Snippets

Many scikit-learn function arguments take their values from extensive option lists. The following snippets are triggered with sk-args and provide features allowing you to easily select valid argument values from lists of available options.

Snippet	Placeholders	Description
`sk-args-random_state`	`random_state`	Set the `random_state` argument for reproducibility in randomized algorithms. This argument is the integer seed number used to initialize random number generation; a randomly chosen value is provided by default.
`sk-args-alphas`	`logspace` or `linspace`, `start`, `stop`, `num`	Set the `alphas` argument (a logarithmic or linear sequence of regularization parameters) for cross-validation in `RidgeCV`, `LassoCV`, and `ElasticNetCV` regression.
`sk-args-func`	`func`	Set the `func` argument to a `FunctionTransformer` from a list of common transformations. For use in regression models with transformed features (`X`), see `sk-regress-linear-transform`.
`sk-args-func-inverse`	`func`, `inverse_func`	Set the `func` and `inverse_func` arguments to a `FunctionTransformer` from a list of common forward and inverse transformation function pairs. For use in regression models with transformed target (`y`), see `sk-regress-linear-transform-target`.
`sk-args-spline-extrapolation`	`extrapolation`	Set the `SplineTransformer` extrapolation behavior beyond the minimum and maximum of the training data, see `sk-regress-linear-spline`.
`sk-args-kernel`	`kernel`	Set the `kernel` type for kernel density estimation from an option list. You can provide this argument when creating a `KernelDensity` model, see `sk-density-kernel`.

3. Release Notes

3.1 Scikit-learn version

The snippets provided by this extension were developed using scikit-learn version 1.7 but will also produce working Python code for earlier and later versions.

3.2 Editor Support for Snippets

Snippets for producing Python code, including those provided by this extension, are supported in the Python file (.py) editor and in the notebook (.ipynb) editor.

3.3 Snippets and IntelliSense

When triggered, the default behaviour of IntelliSense is to show snippets along with other context dependent suggestions. This may result in a long list of suggestions in the IntelliSense pop-up, particularly if the snippet trigger provided by this extension (sk) also matches other symbols in your editor.

It's easy to modify this behaviour using your Visual Studio Code settings. To access the relevant settings go to Preferences > Settings and type snippets in the Search settings field as shown below:

Snippets and IntelliSense settings

You can control whether snippets are shown with other suggestions and how they are sorted using the Editor: Snippet Suggestions dropdown. Choose one of the options to control how snippet suggestions are shown in the IntelliSense popup:

Option	IntelliSense
`top`	Show snippet suggestions on top of other suggestions.
`bottom`	Show snippet suggestions below other suggestions.
`inline`	Show snippet suggestions with other suggestions (default).
`none`	Do not show snippet suggestions.

You can also use the Editor > Suggest: Show Snippets checkbox to enable or disable snippets in IntelliSense suggestions. When snippets are disabled in IntelliSense they are still accessible through the Command Palette Insert Snippet command.

4. Reference

4.1 Scikit-learn documentation

For more information about scikit-learn including extensive documentation and examples visit https://scikit-learn.org/stable/index.html

4.2 Scikit-learn snippet tree

Snippet prefix triggers are organized in a hierarchical tree structure rooted at sk as shown in the figure below. The snippet hierarchy is designed to ease the user's cognitive load when developing models with this large and complex machine learning package. The branches at the top of the tree outline the main steps in a machine learning workflow, branches at lower levels outline a taxonomy of algorithms for specific tasks, whereas leaf nodes represent particular algorithms. The process of inserting a snippet amounts to navigating the tree and selecting the desired leaf node by either of the methods described in Section 1.3.

sk
├── sk-setup
├── sk-read
│   ├── sk-read-csv
│   ├── sk-read-excel
│   ├── sk-read-feather
│   └── sk-read-parquet
├── sk-prep
│   ├── sk-prep-target-features
│   ├── sk-prep-target-features-secondary
│   ├── sk-prep-train_test_split
│   ├── sk-prep-features
│   └── sk-prep-features-secondary
├── sk-regress
│   ├── sk-regress-linear
│   │   ├── sk-regress-linear
│   │   ├── sk-regress-linear-transform-target
│   │   ├── sk-regress-linear-transform
│   │   ├── sk-regress-linear-polynomial
│   │   ├── sk-regress-linear-spline
│   │   ├── sk-regress-linear-pcr
│   │   ├── sk-regress-linear-pls
│   │   ├── sk-regress-linear-ridge
│   │   ├── sk-regress-linear-ridgecv
│   │   ├── sk-regress-linear-lasso
│   │   ├── sk-regress-linear-lassocv
│   │   ├── sk-regress-linear-elasticnet
│   │   └── sk-regress-linear-elasticnetcv
│   ├── sk-regress-ensemble
│   │   ├── sk-regress-ensemble-random-forest
│   │   ├── sk-regress-ensemble-extra-trees
│   │   ├── sk-regress-ensemble-gradient-boosting
│   │   ├── sk-regress-ensemble-hist-gradient-boosting
│   │   └── sk-regress-ensemble-stacking
│   └── sk-regress-dummy
├── sk-cluster
│   ├── sk-cluster-kmeans
│   ├── sk-cluster-kmeans-minibatch
│   ├── sk-cluster-meanshift
│   ├── sk-cluster-dbscan
│   ├── sk-cluster-hdbscan
│   └── sk-cluster-predict
├── sk-density
│   ├── sk-density-kernel
│   ├── sk-density-gaussian-mixture
│   ├── sk-density-sample-kernel
│   ├── sk-density-sample-gaussian-mixture
│   ├── sk-density-score-samples
│   └── sk-density-score
├── sk-embed
│   ├── sk-embed-pca
│   ├── sk-embed-kpca
│   ├── sk-embed-lle
│   ├── sk-embed-isomap
│   ├── sk-embed-mds
│   ├── sk-embed-spectral
│   ├── sk-embed-tsne
│   └── sk-embed-transform
├── sk-anomaly
│   ├── sk-anomaly-one-class-svm
│   ├── sk-anomaly-one-class-svm-sgd
│   ├── sk-anomaly-local-outlier-factor
│   ├── sk-anomaly-isolation-forest
│   ├── sk-anomaly-elliptic-envelope
│   ├── sk-anomaly-dbscan
│   ├── sk-anomaly-predict
│   └── sk-anomaly-score
├── sk-inspect
│   ├── sk-inspect-partial_dependence
│   └── sk-inspect-permutation_importance
├── sk-io
│   ├── sk-io-read-pickle
│   └── sk-io-write-pickle
└── sk-args
    ├── sk-args-random_state
    ├── sk-args-alphas
    ├── sk-args-func
    ├── sk-args-func-inverse
    ├── sk-args-spline-extrapolation
    └── sk-args-kernel

Scikit-learn Snippets

Analytic Signal Limited

Scikit-learn Snippets

1. Usage

1.1 Overview

1.2 Snippets for machine learning workflows

1.3 Inserting snippets

2. Features

2.1 Setup

2.2 Read training data

2.3 Preprocessing for supervised learning

2.4 Preprocessing for unsupervised learning

2.5 Regression

2.5.1 Linear regression

2.5.2 Ensemble regression

2.6 Classification

2.7 Clustering

2.8 Density estimation

2.9 Dimensionality reduction

2.10 Anomaly detection

2.11 Model inspection

2.12 Model storage

2.13 Argument Snippets

3. Release Notes

3.1 Scikit-learn version

3.2 Editor Support for Snippets

3.3 Snippets and IntelliSense

4. Reference

4.1 Scikit-learn documentation

4.2 Scikit-learn snippet tree