Data Science Snippets for VSCode
This VSCode extension provides a collection of useful code snippets for data science tasks. These snippets cover a wide range of common operations including data manipulation, visualization, and machine learning using popular libraries like Pandas, NumPy, Matplotlib, Seaborn, and Scikit-learn.
Import Libraries
Prefix: importlibs
Description: Import common data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Load CSV with Pandas
Prefix: loadcsv
Description: Load a CSV file into a Pandas DataFrame
df ='filename.csv'),
print(df.head(), '\\n')
Convert Data Types
Prefix: convert_dtypes
Description: Convert data types of DataFrame columns
df['column'] = df['column'].astype('data_type')
Remove Duplicates
Prefix: remove_duplicates
Description: Remove duplicate rows from the DataFrame
df = df.drop_duplicates()
Basic Plot with Matplotlib
Prefix: plotbasic
Description: Create a basic plot with Matplotlib
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
Subplots with Matplotlib
Prefix: subplots
Description: Create subplots with Matplotlib
fig, axes = plt.subplots(nrows, ncols, figsize=(15, 10))
axes[0, 0].plot(x1, y1)
axes[0, 0].set_title('Title 1')
axes[0, 1].plot(x2, y2)
axes[0, 1].set_title('Title 2')
# Add more subplots as needed
Correlation Heatmap with Seaborn
Prefix: heatmap
Description: Create a correlation heatmap with Seaborn
plt.figure(figsize=(10, 8))
numeric_df = df.select_dtypes(include=[np.number])
sns.heatmap(numeric_df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
Histogram with Seaborn
Prefix: histplot
Description: Create a histogram with Seaborn
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='column', bins=30, kde=True)
Box Plot with Seaborn
Prefix: boxplot
Description: Create a box plot with Seaborn
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x='category', y='values')
plt.title('Box Plot')
Pair Plot with Seaborn
Prefix: pairplot
Description: Create a pair plot with Seaborn
Violin Plot with Seaborn
Prefix: violinplot
Description: Create a violin plot with Seaborn
plt.figure(figsize=(10, 6))
sns.violinplot(x='category', y='values', data=df)
plt.title('Violin Plot')
Prefix: formatted_heatmap
Description: Create a formatted correlation heatmap with Seaborn
plt.figure(figsize=(10, 2:8))
heatmap = sns.heatmap(df.corr(), annot=True, fmt='.2f', cmap='coolwarm')
heatmap.set_title('Correlation Heatmap', fontdict='fontsize':18}, pad=16)
One-Hot Encoding with Scikit-learn
Prefix: onehotencoder
Description: Apply one-hot encoding to a categorical column using Scikit-learn
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse_output=False)
encoded_data = encoder.fit_transform(df[['category_column']])
encoded_df = pd.DataFrame(encoded_data, columns=encoder.get_feature_names_out(['category_column']))
df = pd.concat([df, encoded_df], axis=1).drop(columns=['category_column'])
Train-Test Split with Scikit-learn
Prefix: train_test_split
Description: Split data into training and testing sets using Scikit-learn
from sklearn.model_selection import train_test_split
X = df.drop(columns=['target_column'])
y = df['target_column']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
Linear Regression with Scikit-learn
Prefix: linear_regression
Description: Perform linear regression using Scikit-learn
from sklearn.linear_model import LinearRegression
model = LinearRegression(), y_train)
predictions = model.predict(X_test)
plt.scatter(y_test, predictions)
plt.xlabel('True Values')
plt.title('Linear Regression')
Logistic Regression with Scikit-learn
Prefix: logistic_regression
Description: Perform logistic regression using Scikit-learn
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
model = LogisticRegression(max_iter=1000), y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)
print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
K-Nearest Neighbors with Scikit-learn
Prefix: knn
Description: Perform K-Nearest Neighbors classification using Scikit-learn
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
model = KNeighborsClassifier(n_neighbors=5), y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)
print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
Support Vector Classifier with Scikit-learn
Prefix: svc
Description: Perform Support Vector Classification using Scikit-learn
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
model = SVC(kernel='linear'), y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)
print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
Decision Tree Classifier with Scikit-learn
Prefix: decision_tree
Description: Perform Decision Tree Classification using Scikit-learn
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
model = DecisionTreeClassifier(random_state=42), y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)
print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
Random Forest Classifier with Scikit-learn
Prefix: random_forest
Description: Perform Random Forest Classification using Scikit-learn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
model = RandomForestClassifier(n_estimators=100, random_state=42), y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)
print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
K-Means Clustering with Scikit-learn
Prefix: kmeans
Description: Perform K-Means Clustering using Scikit-learn
from sklearn.cluster import KMeans
model = KMeans(n_clusters=3, random_state=42)
labels = model.predict(X)
df['Cluster'] = labels
Impute Missing Values
Prefix: impute_missing
Description: Impute missing values using SimpleImputer
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
df['column'] = imputer.fit_transform(df[['column']])
Confusion Matrix Plot
Prefix: confusion_matrix_plot
Description: Plot a confusion matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
cm = confusion_matrix(y_test, predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
ROC Curve Plot
Prefix: roc_curve_plot
Description: Plot an ROC curve
from sklearn.metrics import roc_curve, roc_auc_score
fpr, tpr, _ = roc_curve(y_test, model.predict_proba(X_test)[:,1])
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc_score(y_test, predictions))
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
Cross-Validation Score
Prefix: cross_val_score
Description: Calculate cross-validation scores
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print('Cross-validation scores:', scores)
print('Mean score:', scores.mean())
Standard Scaling with Scikit-learn
Prefix: standard_scaling
Description: Apply standard scaling to a column
"from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()",
df_scaled = scaler.fit_transform(df[['column']])
df['column_scaled'] = df_scaled
Min-Max Scaling with Scikit-learn
Prefix: minmax_scaling
Description: Apply min-max scaling to a column
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df[['column']])
df['column_scaled'] = df_scaled
Normalize Data
Prefix: normalize_data
Description: Normalize data using MinMaxScaler
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[['column']] = scaler.fit_transform(df[['column']])
Standardize Data
Prefix: standardize_data
Description: Standardize data using StandardScaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df[['column']] = scaler.fit_transform(df[['column']])
Open a Python file in VSCode.
Type the prefix of the snippet (e.g., importlibs) and select the snippet from the IntelliSense suggestions.
Customize the placeholder values as needed.
Leonardo A. B. Noman