Python for Data Science & Machine Learning

Your Essential Setup Guide

Welcome to Your Python Data Science & ML Journey

This guide will walk you through setting up your development environment for Python-based data science and machine learning tasks. We'll cover essential tools and libraries to get you started quickly and efficiently.

Mastering these tools is crucial for data exploration, visualization, model building, and deployment.

Prerequisites

Before we begin, ensure you have the following basics:

Step 1: Install Python

We recommend installing Python using an installer that manages versions and packages effectively.

Windows

  1. Visit the official Python website: python.org/downloads/

  2. Download the latest stable Python 3 installer (e.g., Python 3.11 or 3.12).

  3. Run the installer. Crucially, check the box that says "Add Python X.Y to PATH" during installation. This makes Python accessible from your command prompt.

  4. Open Command Prompt and verify the installation by typing:

    python --version

    You should see the installed Python version.

macOS

  1. macOS usually comes with an older version of Python pre-installed. It's highly recommended to install a newer version.

  2. You can download the latest installer from python.org/downloads/ and follow similar steps as Windows.

  3. Alternatively, use a package manager like Homebrew:

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    brew install python3
  4. Open Terminal and verify by typing:

    python3 --version

    You should see the installed Python version. You might use python3 instead of python.

Linux

  1. Most Linux distributions come with Python 3 pre-installed. Check your version:

    python3 --version
  2. If you need to install or upgrade, use your distribution's package manager:

    sudo apt update && sudo apt install python3 python3-pip python3-venv  # For Debian/Ubuntu
    sudo dnf install python3 python3-pip python3-venv               # For Fedora
    sudo yum install python3 python3-pip python3-venv               # For CentOS/RHEL (older)
  3. Verify the installation.

Step 2: Ensure Pip is Installed

pip is the package installer for Python. It's usually included with Python installations from version 3.4 onwards.

Open your terminal or command prompt and run:

pip --version

If pip is not found, you might need to install it separately or ensure your Python installation was done correctly (especially on Linux where it might be pip3).

To upgrade pip:

python -m pip install --upgrade pip

Step 3: Set Up Virtual Environments

Virtual environments are essential for isolating project dependencies. This prevents conflicts between different projects that might require different versions of the same library.

Using venv (built-in)

  1. Navigate to your project directory in the terminal.

  2. Create a virtual environment:

    python -m venv .venv

    This creates a folder named .venv (you can choose another name) in your current directory.

  3. Activate the virtual environment:

    • Windows (Command Prompt/PowerShell):
    • .venv\Scripts\activate
    • macOS/Linux (Bash/Zsh):
    • source .venv/bin/activate

    Your terminal prompt will change to indicate the active environment (e.g., (.venv) your_prompt>).

  4. To deactivate, simply type:

    deactivate

It's a good practice to create a requirements.txt file to list your project's dependencies:

pip freeze > requirements.txt

And to install dependencies from it:

pip install -r requirements.txt

Step 4: Install Core Data Science & ML Libraries

With your environment set up, you can now install the fundamental libraries. Ensure your virtual environment is activated before running these commands.

Use pip to install them:

pip install numpy pandas matplotlib scikit-learn jupyterlab

To install specific versions or additional libraries, you can add them to the command or your requirements.txt file.

Step 5: Launch JupyterLab

JupyterLab is your primary tool for interactive data exploration and model development.

In your activated virtual environment, run:

jupyter lab

This will open JupyterLab in your web browser, providing an interface to create notebooks, manage files, and run code interactively.

Optional: Advanced Tools & Libraries

As you progress, you might want to explore these powerful additions:

Next Steps

You now have a robust Python environment for data science and machine learning! Start experimenting:

Continue to explore the vast ecosystem of Python libraries and resources available.