Unsupervised Learning
Unsupervised learning is a branch of machine learning that deals with uncovering hidden patterns in data without the need for labeled outcomes. Unlike supervised learning, where models are trained on input-output pairs, unsupervised algorithms explore the intrinsic structure of the data.
Core Techniques
Clustering
Clustering groups similar data points together. Popular algorithms include K‑means, hierarchical clustering, and DBSCAN.
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1,2],[1,4],[1,0],
[10,2],[10,4],[10,0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
print(kmeans.labels_)
Dimensionality Reduction
Techniques like Principal Component Analysis (PCA) compress data while preserving most variance.
from sklearn.decomposition import PCA
import pandas as pd
df = pd.read_csv('data.csv')
pca = PCA(n_components=2)
reduced = pca.fit_transform(df)
print(reduced[:5])
Anomaly Detection
Identifying outliers can be crucial for fraud detection or system monitoring.
from sklearn.ensemble import IsolationForest
model = IsolationForest(contamination=0.01)
model.fit(X)
outliers = model.predict(X) == -1
print('Anomalies:', X[outliers])
When to Use Unsupervised Learning
- When labeled data is scarce or expensive.
- For exploratory data analysis to discover hidden structures.
- To preprocess data for downstream supervised tasks (e.g., feature engineering).
Further Reading
Explore related topics to deepen your understanding: