Getting Started with scikit‑learn
scikit‑learn is a powerful, open‑source Python library for machine learning. It provides simple and efficient tools for data mining, data analysis, and modeling. This tutorial walks you through installing the library, loading a dataset, training a model, and making predictions.
1. Installation
Make sure you have Python 3.8+ installed. Then run:
pip install scikit-learn numpy pandas matplotlib
2. The Classic Iris Example
We’ll train a KNeighborsClassifier
to classify iris flowers.
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42)
# Create and train model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# Predict and evaluate
y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
3. Run the example online
Enter three feature values (sepal length, sepal width, petal length) and see the predicted iris species.