Pandas Tutorial: loc, iloc, and groupby

Introduction

This tutorial covers the use of loc, iloc, and groupby functions in pandas, a powerful data manipulation library in Python. We will go through examples and explanations to understand how these functions work.

Setting Up

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [24, 27, 22, 32, 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Score': [85, 90, 95, 80, 70]
}
df = pd.DataFrame(data)
print(df)

Output:

      Name  Age         City  Score
0    Alice   24     New York     85
1      Bob   27  Los Angeles     90
2  Charlie   22      Chicago     95
3    David   32      Houston     80
4      Eva   29      Phoenix     70

Using loc

The loc function is used to access a group of rows and columns by labels or a boolean array.

# Accessing a single row by label
print(df.loc[2])

Output:

Name     Charlie
Age            22
City      Chicago
Score          95
Name: 2, dtype: object
Question: How do you access multiple rows using loc?
# Accessing multiple rows by labels
print(df.loc[[0, 2, 4]])

Output:

      Name  Age      City  Score
0    Alice   24  New York     85
2  Charlie   22   Chicago     95
4      Eva   29   Phoenix     70
Answer: You can access multiple rows by passing a list of labels to the loc function.

Using iloc

The iloc function is used to access a group of rows and columns by integer positions.

# Accessing a single row by integer position
print(df.iloc[3])

Output:

Name      David
Age          32
City    Houston
Score        80
Name: 3, dtype: object
Question: How do you access a specific cell using iloc?
# Accessing a specific cell
print(df.iloc[1, 2])

Output:

Los Angeles
Answer: You can access a specific cell by passing the row and column integer positions to the iloc function.

Using groupby

The groupby function is used to split the data into groups based on some criteria.

Step 1: Creating a 'Category' Column

# Adding a 'Category' column
df['Category'] = ['A', 'B', 'A', 'B', 'A']
print(df)

Output:

      Name  Age         City  Score Category
0    Alice   24     New York     85        A
1      Bob   27  Los Angeles     90        B
2  Charlie   22      Chicago     95        A
3    David   32      Houston     80        B
4      Eva   29      Phoenix     70        A

Step 2: Grouping by 'Category' and Calculating Mean

# Grouping by 'Category' and calculating the mean
grouped_df = df.groupby('Category').mean()
print(grouped_df)

Output:

           Age  Score
Category            
A         25.0   83.3
B         29.5   85.0
Question: How do you group by 'Category' and calculate the sum of 'Score'?
# Grouping by 'Category' and calculating the sum of 'Score'
grouped_sum_df = df.groupby('Category')['Score'].sum()
print(grouped_sum_df)

Output:

Category
A    250
B    170
Name: Score, dtype: int64
Answer: You can group by 'Category' and calculate the sum of 'Score' by passing 'Category' to the groupby function and then selecting the 'Score' column before applying the sum function.

Additional Exercises

  1. Use the loc function to access the 'Name' and 'Score' columns for rows where the 'Age' is greater than 25.
  2. # Your Code
    result = df.loc[df['Age'] > 25, ['Name', 'Score']]
    print(result)

    Output:

       Name  Score
    1    Bob     90
    3  David     80
    4    Eva     70
  3. Use the iloc function to access the first three rows and the first two columns.
  4. # Your Code
    result = df.iloc[:3, :2]
    print(result)

    Output:

          Name  Age
    0    Alice   24
    1      Bob   27
    2  Charlie   22

Conclusion

In this tutorial, we covered the basics of using loc, iloc, and groupby functions in pandas. These functions are essential for data manipulation and analysis in Python.