MSDN Community

Your hub for AI & Machine Learning Development

Introduction to Scikit-learn

Welcome to this introductory tutorial on Scikit-learn, one of the most popular and powerful Python libraries for machine learning. Scikit-learn provides simple and efficient tools for data analysis and machine learning, built upon NumPy, SciPy, and Matplotlib.

What is Scikit-learn?

Scikit-learn is an open-source machine learning library for Python. It features various classification, regression, clustering, and dimensionality reduction algorithms including:

Why Use Scikit-learn?

Scikit-learn is widely used due to its:

Getting Started

Before you can use Scikit-learn, you need to have Python installed along with NumPy and SciPy. If you're using a distribution like Anaconda, these are usually included. You can install Scikit-learn using pip:

pip install scikit-learn

Your First Scikit-learn Model

Let's build a very simple linear regression model. We'll use a small, built-in dataset for demonstration.

1. Import Necessary Libraries

First, import the required modules:

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston # Note: load_boston is deprecated in newer versions, consider using load_from_file or other datasets.
            

2. Load and Prepare Data

We'll load a sample dataset. For this example, we'll use the Boston Housing dataset (though it's deprecated, it's good for illustration).

# Load dataset
            # In a real scenario, you'd load your own data here
            boston = load_boston()
            X = boston.data
            y = boston.target

            # Split data into training and testing sets
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
            

3. Create and Train the Model

Instantiate the linear regression model and train it on the training data:

# Create a Linear Regression model
            model = LinearRegression()

            # Train the model
            model.fit(X_train, y_train)
            

4. Make Predictions and Evaluate

Now, use the trained model to make predictions on the test set and evaluate its performance.

# Make predictions
            y_pred = model.predict(X_test)

            # Evaluate the model (e.g., using R-squared score)
            score = model.score(X_test, y_test)
            print(f"R-squared score: {score:.2f}")
            

Next Steps

This is just a glimpse of what Scikit-learn can do. In subsequent tutorials, we'll dive deeper into:

Ready to explore more advanced machine learning concepts with Scikit-learn?

Next: Data Preprocessing