Scikit‑Learn Tutorials

What is Supervised Learning?

Supervised learning algorithms learn a mapping from input features to an output label using a labeled dataset. They are widely used for classification and regression tasks.

Key Concepts

  • Training set: Data used to fit the model.
  • Test set: Unseen data to evaluate performance.
  • Overfitting: Model captures noise instead of the underlying pattern.
  • Cross‑validation: Technique to assess model generalization.

Example: Iris Classification

We will train a classifier to predict the species of iris flowers using the famous Iris dataset.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y)

# Scale
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Try It Yourself