This tutorial covers the use of loc
, iloc
, and groupby
functions in pandas, a powerful data manipulation library in Python. We will go through examples and explanations to understand how these functions work.
import pandas as pd
# Sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [24, 27, 22, 32, 29],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
'Score': [85, 90, 95, 80, 70]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City Score
0 Alice 24 New York 85
1 Bob 27 Los Angeles 90
2 Charlie 22 Chicago 95
3 David 32 Houston 80
4 Eva 29 Phoenix 70
loc
The loc
function is used to access a group of rows and columns by labels or a boolean array.
# Accessing a single row by label
print(df.loc[2])
Output:
Name Charlie
Age 22
City Chicago
Score 95
Name: 2, dtype: object
loc
?# Accessing multiple rows by labels
print(df.loc[[0, 2, 4]])
Output:
Name Age City Score
0 Alice 24 New York 85
2 Charlie 22 Chicago 95
4 Eva 29 Phoenix 70
loc
function.iloc
The iloc
function is used to access a group of rows and columns by integer positions.
# Accessing a single row by integer position
print(df.iloc[3])
Output:
Name David
Age 32
City Houston
Score 80
Name: 3, dtype: object
iloc
?# Accessing a specific cell
print(df.iloc[1, 2])
Output:
Los Angeles
iloc
function.groupby
The groupby
function is used to split the data into groups based on some criteria.
# Adding a 'Category' column
df['Category'] = ['A', 'B', 'A', 'B', 'A']
print(df)
Output:
Name Age City Score Category
0 Alice 24 New York 85 A
1 Bob 27 Los Angeles 90 B
2 Charlie 22 Chicago 95 A
3 David 32 Houston 80 B
4 Eva 29 Phoenix 70 A
# Grouping by 'Category' and calculating the mean
grouped_df = df.groupby('Category').mean()
print(grouped_df)
Output:
Age Score
Category
A 25.0 83.3
B 29.5 85.0
# Grouping by 'Category' and calculating the sum of 'Score'
grouped_sum_df = df.groupby('Category')['Score'].sum()
print(grouped_sum_df)
Output:
Category
A 250
B 170
Name: Score, dtype: int64
groupby
function and then selecting the 'Score' column before applying the sum
function.loc
function to access the 'Name' and 'Score' columns for rows where the 'Age' is greater than 25.# Your Code
result = df.loc[df['Age'] > 25, ['Name', 'Score']]
print(result)
Output:
Name Score
1 Bob 90
3 David 80
4 Eva 70
iloc
function to access the first three rows and the first two columns.# Your Code
result = df.iloc[:3, :2]
print(result)
Output:
Name Age
0 Alice 24
1 Bob 27
2 Charlie 22
In this tutorial, we covered the basics of using loc
, iloc
, and groupby
functions in pandas. These functions are essential for data manipulation and analysis in Python.