Foundations of Python for Data Science and ML

Master the core Python skills for impactful data analysis and machine learning.

Welcome!

This module provides the fundamental Python knowledge necessary to embark on your journey into data science and machine learning. We'll cover essential programming concepts, data handling techniques, and introduce you to powerful libraries like NumPy and Pandas.

Python Basics

Understand the building blocks of Python programming.

  • Variables and Data Types (integers, floats, strings, booleans)
  • Operators (arithmetic, comparison, logical)
  • Control Flow (if-elif-else statements, for loops, while loops)
  • Functions: defining and calling
  • Error Handling (try-except blocks)

Example: A Simple Function

def greet(name):
    """This function greets the person passed in as a parameter."""
    print(f"Hello, {name}!")

greet("World")
# Output: Hello, World!

Core Data Structures

Efficiently organize and manipulate data.

  • Lists: ordered, mutable sequences
  • Tuples: ordered, immutable sequences
  • Dictionaries: key-value pairs
  • Sets: unordered collections of unique elements

Example: Working with a List

numbers = [10, 20, 30, 40, 50]
numbers.append(60)
print(numbers[2])
# Output: 30
print(numbers)
# Output: [10, 20, 30, 40, 50, 60]

NumPy: Numerical Python

The cornerstone for numerical computation in Python.

  • Introduction to NumPy arrays (ndarrays)
  • Array creation and manipulation
  • Vectorized operations for speed
  • Array indexing and slicing
  • Basic mathematical and statistical functions

Example: NumPy Array Operations

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)
# Output: [5 7 9]

print(a * 2)
# Output: [2 4 6]

Pandas: Data Manipulation and Analysis

Essential for data wrangling, cleaning, and exploration.

  • Introduction to Series and DataFrames
  • Reading data from various file formats (CSV, Excel)
  • Data selection, filtering, and sorting
  • Handling missing data (NaN)
  • Data aggregation and grouping (groupby)

Example: Basic DataFrame Usage

import pandas as pd

data = {'col1': [1, 2, 3], 'col2': ['A', 'B', 'C']}
df = pd.DataFrame(data)

print(df.head())
# Output:
#    col1 col2
# 0     1    A
# 1     2    B
# 2     3    C

print(df['col1'].mean())
# Output: 2.0

What's Next?

With these foundational skills, you're ready to dive deeper into: