Your essential guide to numerical operations in Python.
NumPy, which stands for Numerical Python, is a fundamental library for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
It's the backbone for many other data science libraries like Pandas, SciPy, and Scikit-learn. Understanding NumPy is crucial for anyone diving into data analysis, machine learning, or scientific research with Python.
If you don't have NumPy installed, you can easily install it using pip:
pip install numpy
The core of NumPy is its ndarray
object, which is a powerful N-dimensional array. It's similar to Python's built-in list but much more efficient for numerical operations and supports more advanced mathematical functions.
You can create NumPy arrays from Python lists:
import numpy as np
# Create a 1D array
a = np.array([1, 2, 3, 4, 5])
print(a)
# Output: [1 2 3 4 5]
# Create a 2D array
b = np.array([[1, 2, 3], [4, 5, 6]])
print(b)
# Output:
# [[1 2 3]
# [4 5 6]]
NumPy arrays have useful attributes:
ndim
: Number of dimensions.shape
: Tuple of integers indicating the size of the array in each dimension.size
: Total number of elements in the array.dtype
: Data type of the array elements.print(f"Shape of array b: {b.shape}") # Output: Shape of array b: (2, 3)
print(f"Number of dimensions: {b.ndim}") # Output: Number of dimensions: 2
print(f"Total elements: {b.size}") # Output: Total elements: 6
print(f"Data type: {b.dtype}") # Output: Data type: int64 (or similar)
NumPy provides functions to create arrays with specific initial values:
np.zeros()
: Array filled with zeros.np.ones()
: Array filled with ones.np.full()
: Array filled with a specified value.np.arange()
: Array with values in a range.np.linspace()
: Evenly spaced values over an interval.zeros_array = np.zeros((2, 3))
print(zeros_array)
# Output:
# [[0. 0. 0.]
# [0. 0. 0.]]
ones_array = np.ones((3, 2))
print(ones_array)
# Output:
# [[1. 1.]
# [1. 1.]
# [1. 1.]]
full_array = np.full((2, 2), 7)
print(full_array)
# Output:
# [[7 7]
# [7 7]]
range_array = np.arange(0, 10, 2) # Start, Stop, Step
print(range_array)
# Output: [0 2 4 6 8]
linspace_array = np.linspace(0, 1, 5) # Start, Stop, Number of samples
print(linspace_array)
# Output: [0. 0.25 0.5 0.75 1. ]
Accessing elements and sub-arrays is similar to Python lists but extended for multiple dimensions.
arr1d = np.array([10, 20, 30, 40, 50])
print(arr1d[0]) # Output: 10
print(arr1d[1:4]) # Output: [20 30 40] (slices are exclusive of the end index)
print(arr1d[-1]) # Output: 50 (last element)
Use comma-separated indices for each dimension.
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Access element at row 1, column 2 (0-indexed)
print(arr2d[1, 2]) # Output: 6
# Access the first row
print(arr2d[0, :]) # Output: [1 2 3]
# Access the second column
print(arr2d[:, 1]) # Output: [2 5 8]
# Get a sub-array (rows 0 to 1, columns 1 to 2)
print(arr2d[0:2, 1:3])
# Output:
# [[2 3]
# [5 6]]
You can select elements based on conditions.
arr = np.array([1, 2, 3, 4, 5, 6])
condition = arr > 3
print(arr[condition]) # Output: [4 5 6]
print(arr[arr % 2 == 0]) # Output: [2 4 6] (even numbers)
NumPy allows for element-wise operations, making mathematical computations very efficient.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # Element-wise addition: [5 7 9]
print(a - b) # Element-wise subtraction: [-3 -3 -3]
print(a * b) # Element-wise multiplication: [ 4 10 18]
print(a / b) # Element-wise division: [0.25 0.4 0.5 ]
# Scalar operations
print(a + 5) # [6 7 8]
print(a * 2) # [2 4 6]
NumPy provides many universal functions that operate element-wise on arrays.
np.sqrt()
np.exp()
np.sin()
np.cos()
np.log()
arr = np.array([1, 4, 9])
print(np.sqrt(arr)) # [1. 2. 3.]
These functions reduce an array to a single value.
np.sum()
np.mean()
np.median()
np.std()
np.min()
np.max()
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(np.sum(arr)) # Sum of all elements: 21
print(np.sum(arr, axis=0)) # Sum along columns: [5 7 9]
print(np.sum(arr, axis=1)) # Sum along rows: [ 6 15]
print(np.mean(arr, axis=1)) # Mean along rows: [2. 5.]
You can change the shape of an array without changing its data.
arr = np.arange(1, 7) # [1 2 3 4 5 6]
print(arr.reshape((2, 3)))
# Output:
# [[1 2 3]
# [4 5 6]]
# If you specify one dimension as -1, NumPy will infer it
print(arr.reshape((3, -1)))
# Output:
# [[1 2]
# [3 4]
# [5 6]]
Important: The new shape must be compatible with the original number of elements.
Broadcasting is a powerful mechanism that allows NumPy to work with arrays of different shapes when performing arithmetic operations. For example, adding a scalar to an array or adding a 1D array to a 2D array.
row_vector = np.array([1, 2, 3])
matrix = np.array([[10], [20], [30]])
# Broadcasting the row_vector across rows of the matrix
# (Matrix is 3x1, row_vector is 1x3. They can't be directly added)
# NumPy expands the row_vector to 3x3 to match the matrix dimensions.
print(matrix + row_vector)
# Output:
# [[11 12 13]
# [21 22 23]
# [31 32 33]]
This covers the very basics of NumPy. To further enhance your data science skills: