Pandas Series: A Fundamental Data Structure
Welcome to the section on Pandas Series. A Series is a one-dimensional labeled array capable of holding any type of data (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively called the index.
What is a Pandas Series?
A Pandas Series is essentially a column in a NumPy array or a dictionary. It's a fundamental data structure in the Pandas library, providing a powerful and flexible way to work with one-dimensional data. Each element in a Series has an associated index, which allows for easy data retrieval and manipulation.
Creating a Series
You can create a Series from various data structures, including Python lists, NumPy arrays, and dictionaries.
From a Python List
When you create a Series from a list, Pandas automatically assigns a default integer index starting from 0.
import pandas as pd
import numpy as np
# Creating a Series from a list
data_list = [10, 20, 30, 40, 50]
s_list = pd.Series(data_list)
print(s_list)
From a NumPy Array
Similar to lists, NumPy arrays also get a default integer index.
# Creating a Series from a NumPy array
data_numpy = np.array([1.1, 2.2, 3.3, 4.4, 5.5])
s_numpy = pd.Series(data_numpy)
print(s_numpy)
From a Dictionary
When creating a Series from a dictionary, the dictionary keys are used as the Series index.
# Creating a Series from a dictionary
data_dict = {'a': 100, 'b': 200, 'c': 300, 'd': 400}
s_dict = pd.Series(data_dict)
print(s_dict)
Customizing the Index
You can provide your own index when creating a Series.
# Creating a Series with a custom index
data = [1, 2, 3, 4, 5]
index_labels = ['X', 'Y', 'Z', 'W', 'V']
s_custom_index = pd.Series(data, index=index_labels)
print(s_custom_index)
Accessing Series Elements
You can access elements in a Series using their index label or their integer position.
Using Index Labels
print(s_dict['b'])
print(s_custom_index['Z'])
Using Integer Positions (iloc)
The iloc accessor is used for integer-location based indexing.
print(s_list.iloc[2]) # Accessing the element at index position 2
print(s_custom_index.iloc[0]) # Accessing the first element
Series Attributes and Methods
Pandas Series come with a rich set of attributes and methods for data analysis.
Common Attributes
.index: Returns the index of the Series..values: Returns the data as a NumPy array..dtype: Returns the data type of the Series..shape: Returns a tuple representing the dimensionality of the Series..name: Returns the name of the Series.
Common Methods
.head(n): Returns the firstnelements..tail(n): Returns the lastnelements..describe(): Generates descriptive statistics (count, mean, std, min, max, etc.)..mean(): Computes the mean..sum(): Computes the sum..value_counts(): Returns a Series containing counts of unique values.
Let's look at an example using .describe():
# Using .describe() on a Series of numbers
print(s_list.describe())
Operations on Series
You can perform various arithmetic operations on Series, and Pandas will align the data based on the index.
Element-wise Operations
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([10, 20, 30], index=['b', 'c', 'd'])
# Addition
print("s1 + s2:")
print(s1 + s2)
# Multiplication
print("\ns1 * 2:")
print(s1 * 2)
s1 and s2, values with non-matching indices result in NaN (Not a Number).
Filtering Series
You can filter Series based on conditions applied to its values.
# Filtering elements greater than 25
print("Elements greater than 25 in s_list:")
print(s_list[s_list > 25])
# Filtering using index labels
print("\nElements with index 'a' or 'c' in s_dict:")
print(s_dict.loc[['a', 'c']])
Handling Missing Data
Pandas uses NaN to represent missing data. You can detect and handle missing values.
# Series with missing values
data_missing = [10, 20, np.nan, 40, 50]
s_missing = pd.Series(data_missing)
print("Original Series with NaN:")
print(s_missing)
print("\nChecking for NaN:")
print(s_missing.isnull())
print("\nDropping NaN values:")
print(s_missing.dropna())
Summary
The Pandas Series is a fundamental building block for data manipulation in Python. Understanding how to create, access, operate on, and filter Series is crucial for anyone working with data in the Python ecosystem.