NumPy Basics - Data Science with Python

What is NumPy?

NumPy, which stands for Numerical Python, is a fundamental library for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

It's the backbone for many other data science libraries like Pandas, SciPy, and Scikit-learn. Understanding NumPy is crucial for anyone diving into data analysis, machine learning, or scientific research with Python.

Installation

If you don't have NumPy installed, you can easily install it using pip:

pip install numpy

The NumPy Array

The core of NumPy is its ndarray object, which is a powerful N-dimensional array. It's similar to Python's built-in list but much more efficient for numerical operations and supports more advanced mathematical functions.

Creating Arrays

You can create NumPy arrays from Python lists:

import numpy as np

# Create a 1D array
a = np.array([1, 2, 3, 4, 5])
print(a)
# Output: [1 2 3 4 5]

# Create a 2D array
b = np.array([[1, 2, 3], [4, 5, 6]])
print(b)
# Output:
# [[1 2 3]
#  [4 5 6]]

Array Attributes

NumPy arrays have useful attributes:

ndim: Number of dimensions.
shape: Tuple of integers indicating the size of the array in each dimension.
size: Total number of elements in the array.
dtype: Data type of the array elements.

print(f"Shape of array b: {b.shape}") # Output: Shape of array b: (2, 3)
print(f"Number of dimensions: {b.ndim}") # Output: Number of dimensions: 2
print(f"Total elements: {b.size}")     # Output: Total elements: 6
print(f"Data type: {b.dtype}")       # Output: Data type: int64 (or similar)

Array Initialization

NumPy provides functions to create arrays with specific initial values:

np.zeros(): Array filled with zeros.
np.ones(): Array filled with ones.
np.full(): Array filled with a specified value.
np.arange(): Array with values in a range.
np.linspace(): Evenly spaced values over an interval.

zeros_array = np.zeros((2, 3))
print(zeros_array)
# Output:
# [[0. 0. 0.]
#  [0. 0. 0.]]

ones_array = np.ones((3, 2))
print(ones_array)
# Output:
# [[1. 1.]
#  [1. 1.]
#  [1. 1.]]

full_array = np.full((2, 2), 7)
print(full_array)
# Output:
# [[7 7]
#  [7 7]]

range_array = np.arange(0, 10, 2) # Start, Stop, Step
print(range_array)
# Output: [0 2 4 6 8]

linspace_array = np.linspace(0, 1, 5) # Start, Stop, Number of samples
print(linspace_array)
# Output: [0.   0.25 0.5  0.75 1.  ]

Array Indexing and Slicing

Accessing elements and sub-arrays is similar to Python lists but extended for multiple dimensions.

One-Dimensional Arrays

arr1d = np.array([10, 20, 30, 40, 50])
print(arr1d[0])   # Output: 10
print(arr1d[1:4]) # Output: [20 30 40] (slices are exclusive of the end index)
print(arr1d[-1])  # Output: 50 (last element)

Multi-Dimensional Arrays

Use comma-separated indices for each dimension.

arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Access element at row 1, column 2 (0-indexed)
print(arr2d[1, 2]) # Output: 6

# Access the first row
print(arr2d[0, :]) # Output: [1 2 3]

# Access the second column
print(arr2d[:, 1]) # Output: [2 5 8]

# Get a sub-array (rows 0 to 1, columns 1 to 2)
print(arr2d[0:2, 1:3])
# Output:
# [[2 3]
#  [5 6]]

Boolean Indexing

You can select elements based on conditions.

arr = np.array([1, 2, 3, 4, 5, 6])
condition = arr > 3
print(arr[condition]) # Output: [4 5 6]
print(arr[arr % 2 == 0]) # Output: [2 4 6] (even numbers)

Basic Operations

NumPy allows for element-wise operations, making mathematical computations very efficient.

Arithmetic Operations

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b) # Element-wise addition: [5 7 9]
print(a - b) # Element-wise subtraction: [-3 -3 -3]
print(a * b) # Element-wise multiplication: [ 4 10 18]
print(a / b) # Element-wise division: [0.25 0.4  0.5 ]

# Scalar operations
print(a + 5) # [6 7 8]
print(a * 2) # [2 4 6]

Universal Functions (ufuncs)

NumPy provides many universal functions that operate element-wise on arrays.

np.sqrt()
np.exp()
np.sin()
np.cos()
np.log()

arr = np.array([1, 4, 9])
print(np.sqrt(arr)) # [1. 2. 3.]

Aggregation Functions

These functions reduce an array to a single value.

np.sum()
np.mean()
np.median()
np.std()
np.min()
np.max()

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(np.sum(arr))      # Sum of all elements: 21
print(np.sum(arr, axis=0)) # Sum along columns: [5 7 9]
print(np.sum(arr, axis=1)) # Sum along rows: [ 6 15]
print(np.mean(arr, axis=1)) # Mean along rows: [2. 5.]

Reshaping Arrays

You can change the shape of an array without changing its data.

arr = np.arange(1, 7) # [1 2 3 4 5 6]
print(arr.reshape((2, 3)))
# Output:
# [[1 2 3]
#  [4 5 6]]

# If you specify one dimension as -1, NumPy will infer it
print(arr.reshape((3, -1)))
# Output:
# [[1 2]
#  [3 4]
#  [5 6]]

Important: The new shape must be compatible with the original number of elements.

Broadcasting

Broadcasting is a powerful mechanism that allows NumPy to work with arrays of different shapes when performing arithmetic operations. For example, adding a scalar to an array or adding a 1D array to a 2D array.

row_vector = np.array([1, 2, 3])
matrix = np.array([[10], [20], [30]])

# Broadcasting the row_vector across rows of the matrix
# (Matrix is 3x1, row_vector is 1x3. They can't be directly added)
# NumPy expands the row_vector to 3x3 to match the matrix dimensions.
print(matrix + row_vector)
# Output:
# [[11 12 13]
#  [21 22 23]
#  [31 32 33]]

Next Steps

This covers the very basics of NumPy. To further enhance your data science skills:

Explore more advanced array manipulation techniques.
Learn about Pandas, which is built on top of NumPy for data manipulation and analysis.
Dive into SciPy for more advanced scientific and technical computing.