Getting Started with scikit‑learn

scikit‑learn is a powerful, open‑source Python library for machine learning. It provides simple and efficient tools for data mining, data analysis, and modeling. This tutorial walks you through installing the library, loading a dataset, training a model, and making predictions.

1. Installation

Make sure you have Python 3.8+ installed. Then run:

pip install scikit-learn numpy pandas matplotlib

2. The Classic Iris Example

We’ll train a KNeighborsClassifier to classify iris flowers.

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# Create and train model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predict and evaluate
y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

3. Run the example online

Enter three feature values (sepal length, sepal width, petal length) and see the predicted iris species.

4. Next Steps