Getting Started with scikit‑learn

scikit‑learn is a powerful, open‑source Python library for machine learning. It provides simple and efficient tools for data mining, data analysis, and modeling. This tutorial walks you through installing the library, loading a dataset, training a model, and making predictions.

1. Installation

Make sure you have Python 3.8+ installed. Then run:

pip install scikit-learn numpy pandas matplotlib

2. The Classic Iris Example

We’ll train a KNeighborsClassifier to classify iris flowers.

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# Create and train model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predict and evaluate
y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

3. Run the example online

Enter three feature values (sepal length, sepal width, petal length) and see the predicted iris species.

Getting Started with scikit‑learn

1. Installation

2. The Classic Iris Example

3. Run the example online

4. Next Steps