Python Data Science & ML Setup

Welcome to Your Python Data Science & ML Journey

This guide will walk you through setting up your development environment for Python-based data science and machine learning tasks. We'll cover essential tools and libraries to get you started quickly and efficiently.

Mastering these tools is crucial for data exploration, visualization, model building, and deployment.

Prerequisites

Before we begin, ensure you have the following basics:

A working computer (Windows, macOS, or Linux).
Basic familiarity with your operating system's command line or terminal.
An internet connection to download packages.

Step 1: Install Python

We recommend installing Python using an installer that manages versions and packages effectively.

Windows

Visit the official Python website: python.org/downloads/
Download the latest stable Python 3 installer (e.g., Python 3.11 or 3.12).
Run the installer. Crucially, check the box that says "Add Python X.Y to PATH" during installation. This makes Python accessible from your command prompt.
Open Command Prompt and verify the installation by typing:
```
python --version
```
You should see the installed Python version.

macOS

macOS usually comes with an older version of Python pre-installed. It's highly recommended to install a newer version.
You can download the latest installer from python.org/downloads/ and follow similar steps as Windows.

Alternatively, use a package manager like Homebrew:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

brew install python3

Open Terminal and verify by typing:
```
python3 --version
```
You should see the installed Python version. You might use python3 instead of python.

Linux

Most Linux distributions come with Python 3 pre-installed. Check your version:
```
python3 --version
```

If you need to install or upgrade, use your distribution's package manager:

sudo apt update && sudo apt install python3 python3-pip python3-venv  # For Debian/Ubuntu

sudo dnf install python3 python3-pip python3-venv               # For Fedora

sudo yum install python3 python3-pip python3-venv               # For CentOS/RHEL (older)

Verify the installation.

Step 2: Ensure Pip is Installed

pip is the package installer for Python. It's usually included with Python installations from version 3.4 onwards.

Open your terminal or command prompt and run:

pip --version

If pip is not found, you might need to install it separately or ensure your Python installation was done correctly (especially on Linux where it might be pip3).

To upgrade pip:

python -m pip install --upgrade pip

Step 3: Set Up Virtual Environments

Virtual environments are essential for isolating project dependencies. This prevents conflicts between different projects that might require different versions of the same library.

Using `venv` (built-in)

Navigate to your project directory in the terminal.
Create a virtual environment:
```
python -m venv .venv
```
This creates a folder named .venv (you can choose another name) in your current directory.
Activate the virtual environment:
- Windows (Command Prompt/PowerShell):
- macOS/Linux (Bash/Zsh):
Your terminal prompt will change to indicate the active environment (e.g., (.venv) your_prompt>).
To deactivate, simply type:
```
deactivate
```

It's a good practice to create a requirements.txt file to list your project's dependencies:

pip freeze > requirements.txt

And to install dependencies from it:

pip install -r requirements.txt

Step 4: Install Core Data Science & ML Libraries

With your environment set up, you can now install the fundamental libraries. Ensure your virtual environment is activated before running these commands.

Use pip to install them:

pip install numpy pandas matplotlib scikit-learn jupyterlab

NumPy: For numerical operations and array manipulation.
Pandas: For data manipulation and analysis (DataFrames).
Matplotlib: For creating static, interactive, and animated visualizations.
Scikit-learn: A comprehensive library for machine learning algorithms.
JupyterLab: An interactive development environment for notebooks, code, and data.

To install specific versions or additional libraries, you can add them to the command or your requirements.txt file.

Step 5: Launch JupyterLab

JupyterLab is your primary tool for interactive data exploration and model development.

In your activated virtual environment, run:

jupyter lab

This will open JupyterLab in your web browser, providing an interface to create notebooks, manage files, and run code interactively.

Optional: Advanced Tools & Libraries

As you progress, you might want to explore these powerful additions:

Deep Learning Frameworks:

Install TensorFlow or PyTorch for deep learning tasks.
```
pip install tensorflow
```
```
pip install torch torchvision torchaudio
```
Data Visualization:

Seaborn for more aesthetic statistical plots.
```
pip install seaborn
```
Plotly for interactive, web-based visualizations.
```
pip install plotly
```
Data Handling:

Dask for parallel computing with larger-than-memory datasets.
```
pip install dask[complete]
```
Development Environments (IDEs):

Consider full-featured IDEs like VS Code with Python extensions or PyCharm for more robust project management.

Next Steps

You now have a robust Python environment for data science and machine learning! Start experimenting:

Create a new notebook in JupyterLab.
Import your installed libraries (e.g., import pandas as pd).
Load some sample data and begin your analysis.

Continue to explore the vast ecosystem of Python libraries and resources available.