Unsupervised Learning with Scikit‑Learn
Unsupervised learning algorithms find hidden patterns in data without using labeled outcomes. This tutorial covers two fundamental techniques: K‑Means clustering and Principal Component Analysis (PCA).
K‑Means Clustering
K‑Means groups data points into k clusters by minimizing intra‑cluster variance.
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
X, _ = make_blobs(n_samples=300, centers=4, random_state=42)
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_
plt.scatter(X[:,0], X[:,1], c=labels, cmap='viridis')
plt.show()
Run the code locally to see the result, or try the interactive demo below.
Principal Component Analysis (PCA)
PCA reduces dimensionality by projecting data onto orthogonal components that capture maximum variance.
from sklearn.decomposition import PCA
import numpy as np
X = np.random.rand(100, 5) # 5‑dimensional data
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print("Explained variance:", pca.explained_variance_ratio_)