Getting Started with Python for Data Science and ML

Welcome to the practical guide for setting up your Python environment for data science and machine learning. This section will walk you through the essential steps to get you coding in no time.

1. Install Python

The first step is to ensure you have Python installed. We recommend using the latest stable version.

  • Download: Visit the official Python website to download the installer for your operating system (Windows, macOS, Linux).
  • Installation: During installation on Windows, make sure to check the "Add Python X.X to PATH" option. For macOS and Linux, Python is often pre-installed, but installing a newer version is recommended.
  • Verification: Open your terminal or command prompt and type:
    python --version
    This should display the installed Python version.

2. Set Up a Virtual Environment

Virtual environments are crucial for managing project dependencies and avoiding conflicts between different projects.

  • Create: Navigate to your project directory in the terminal and run:
    python -m venv .venv
    This creates a virtual environment named `.venv` in your current directory.
  • Activate:
    • Windows:
      .venv\Scripts\activate
    • macOS/Linux:
      source .venv/bin/activate
    You'll see the environment name in parentheses in your terminal prompt (e.g., `(.venv) C:\your_project>`).

3. Install Essential Libraries

Once your virtual environment is active, you can install the core libraries for data science and machine learning using pip.

pip install numpy pandas matplotlib scikit-learn jupyter
  • NumPy: For numerical operations and array manipulation.
  • Pandas: For data manipulation and analysis (DataFrames).
  • Matplotlib: For creating static, animated, and interactive visualizations.
  • Scikit-learn: A comprehensive library for machine learning algorithms.
  • Jupyter: For interactive coding environments (Notebooks and Lab).

4. Launch Jupyter Notebook/Lab

Jupyter provides an interactive way to explore data and build models.

With your virtual environment activated, run:

jupyter notebook

Or for JupyterLab:

jupyter lab

This will open a new tab in your web browser, allowing you to create new notebooks and start coding.

5. Next Steps

Congratulations! Your Python environment for data science and ML is now set up. Explore the key libraries and start working through our tutorials to build your first models.

Explore Key Libraries