Data Science Snippets for VSCode

This VSCode extension provides a collection of useful code snippets for data science tasks. These snippets cover a wide range of common operations including data manipulation, visualization, and machine learning using popular libraries like Pandas, NumPy, Matplotlib, Seaborn, and Scikit-learn.

Features

Import Libraries

Prefix: importlibs
Description: Import common data science libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Load CSV with Pandas

Prefix: loadcsv Description: Load a CSV file into a Pandas DataFrame

df = pd.read('filename.csv'),
df.drop(['Index'],axis=1,inplace=True)
print(df.head(), '\\n')
print(df.isnull().sum())

Convert Data Types

Prefix: convert_dtypes Description: Convert data types of DataFrame columns

df['column'] = df['column'].astype('data_type')
print(df.dtypes)

Remove Duplicates

Prefix: remove_duplicates Description: Remove duplicate rows from the DataFrame

df = df.drop_duplicates()
print(df.head())

Basic Plot with Matplotlib

Prefix: plotbasic Description: Create a basic plot with Matplotlib

plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title(Title)
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
plt.show()

Subplots with Matplotlib

Prefix: subplots Description: Create subplots with Matplotlib

fig, axes = plt.subplots(nrows, ncols, figsize=(15, 10))
axes[0, 0].plot(x1, y1)
axes[0, 0].set_title('Title 1')
axes[0, 1].plot(x2, y2)
axes[0, 1].set_title('Title 2')
# Add more subplots as needed
plt.tight_layout()
plt.show()

Correlation Heatmap with Seaborn

Prefix: heatmap Description: Create a correlation heatmap with Seaborn

plt.figure(figsize=(10, 8))
numeric_df = df.select_dtypes(include=[np.number])
sns.heatmap(numeric_df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

Histogram with Seaborn

Prefix: histplot Description: Create a histogram with Seaborn

plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='column', bins=30, kde=True)
plt.title('Histogram')
plt.xlabel('Column')
plt.ylabel('Frequency')
plt.show()

Box Plot with Seaborn

Prefix: boxplot Description: Create a box plot with Seaborn

plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x='category', y='values')
plt.title('Box Plot')
plt.xlabel('Category')
plt.ylabel('Values')
plt.show()

Pair Plot with Seaborn

Prefix: pairplot Description: Create a pair plot with Seaborn

sns.pairplot(df)
plt.show()

Violin Plot with Seaborn

Prefix: violinplot Description: Create a violin plot with Seaborn

plt.figure(figsize=(10, 6))
sns.violinplot(x='category', y='values', data=df)
plt.title('Violin Plot')
plt.show()

Heatmap with Formatting

Prefix: formatted_heatmap Description: Create a formatted correlation heatmap with Seaborn

plt.figure(figsize=(10, 2:8))
heatmap = sns.heatmap(df.corr(), annot=True, fmt='.2f', cmap='coolwarm')
heatmap.set_title('Correlation Heatmap', fontdict='fontsize':18}, pad=16)
plt.show()

One-Hot Encoding with Scikit-learn

Prefix: onehotencoder Description: Apply one-hot encoding to a categorical column using Scikit-learn

from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse_output=False)
encoded_data = encoder.fit_transform(df[['category_column']])
encoded_df = pd.DataFrame(encoded_data, columns=encoder.get_feature_names_out(['category_column']))
df = pd.concat([df, encoded_df], axis=1).drop(columns=['category_column'])
print(df.head())

Train-Test Split with Scikit-learn

Prefix: train_test_split Description: Split data into training and testing sets using Scikit-learn

from sklearn.model_selection import train_test_split

X = df.drop(columns=['target_column'])
y = df['target_column']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

Linear Regression with Scikit-learn

Prefix: linear_regression Description: Perform linear regression using Scikit-learn

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
plt.scatter(y_test, predictions)
plt.xlabel('True Values')
plt.ylabel('Predictions')
plt.title('Linear Regression')
plt.show()

Logistic Regression with Scikit-learn

Prefix: logistic_regression Description: Perform logistic regression using Scikit-learn

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)

print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)

K-Nearest Neighbors with Scikit-learn

Prefix: knn Description: Perform K-Nearest Neighbors classification using Scikit-learn

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)

print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)

Support Vector Classifier with Scikit-learn

Prefix: svc Description: Perform Support Vector Classification using Scikit-learn

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

model = SVC(kernel='linear')
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)

print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)

Decision Tree Classifier with Scikit-learn

Prefix: decision_tree Description: Perform Decision Tree Classification using Scikit-learn

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)

print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)

Random Forest Classifier with Scikit-learn

Prefix: random_forest Description: Perform Random Forest Classification using Scikit-learn

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)

print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)

K-Means Clustering with Scikit-learn

Prefix: kmeans Description: Perform K-Means Clustering using Scikit-learn

from sklearn.cluster import KMeans

model = KMeans(n_clusters=3, random_state=42)
model.fit(X)
labels = model.predict(X)
df['Cluster'] = labels
print(df.head())

Impute Missing Values

Prefix: impute_missing Description: Impute missing values using SimpleImputer

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='mean')
df['column'] = imputer.fit_transform(df[['column']])
print(df.head())

Confusion Matrix Plot

Prefix: confusion_matrix_plot Description: Plot a confusion matrix

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

cm = confusion_matrix(y_test, predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.show()

ROC Curve Plot

Prefix: roc_curve_plot Description: Plot an ROC curve

from sklearn.metrics import roc_curve, roc_auc_score

fpr, tpr, _ = roc_curve(y_test, model.predict_proba(X_test)[:,1])
plt.figure(figsize=(10,6))
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc_score(y_test, predictions))
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()

Cross-Validation Score

Prefix: cross_val_score Description: Calculate cross-validation scores

from sklearn.model_selection import cross_val_score

scores = cross_val_score(model, X, y, cv=5)
print('Cross-validation scores:', scores)
print('Mean score:', scores.mean())

Standard Scaling with Scikit-learn

Prefix: standard_scaling Description: Apply standard scaling to a column

"from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()",
df_scaled = scaler.fit_transform(df[['column']])
df['column_scaled'] = df_scaled
print(df.head())

Min-Max Scaling with Scikit-learn

Prefix: minmax_scaling Description: Apply min-max scaling to a column

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df[['column']])
df['column_scaled'] = df_scaled
print(df.head())

Normalize Data

Prefix: normalize_data Description: Normalize data using MinMaxScaler

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df[['column']] = scaler.fit_transform(df[['column']])
print(df.head())

Standardize Data

Prefix: standardize_data Description: Standardize data using StandardScaler

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df[['column']] = scaler.fit_transform(df[['column']])
print(df.head())

Usage

Open a Python file in VSCode. Type the prefix of the snippet (e.g., importlibs) and select the snippet from the IntelliSense suggestions. Customize the placeholder values as needed.

Contributors

Leonardo A. B. Noman

DSnippets

Leonardo Antunes Barreto Noman