Matplotlib Tutorial

This tutorial covers the basics of plotting with Matplotlib in Python. We will cover the following types of plots:

1. Scatter Plots

Scatter plots are used to observe the relationship between variables. Each point represents an observation.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [5, 7, 4, 6, 5]

plt.scatter(x, y, color='magenta', marker='o', label='Data points')
plt.title('Scatter Plot')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='upper right')
plt.show()
        
Scatter Plot

Question: What does each point in a scatter plot represent?

Answer: Each point in a scatter plot represents an observation, showing the relationship between the x and y variables.

2. Line Plots

Line plots are used to represent data points connected by straight lines. They are useful for showing trends over time.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

plt.plot(x, y, marker='o', linestyle='-', color='green', label='Line')
plt.title('Line Plot')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='upper left')
plt.show()
        
Line Plot

Question: When are line plots particularly useful?

Answer: Line plots are particularly useful for showing trends over time or sequential data.

3. Bar Plots

Bar plots are used to represent data with rectangular bars. The length of each bar is proportional to the value it represents.

import matplotlib.pyplot as plt

x = ['A', 'B', 'C', 'D']
y = [3, 7, 5, 9]

plt.bar(x, y, color='purple', label='Values')
plt.title('Bar Plot')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.legend(loc='upper left')
plt.show()
        
Bar Plot

Question: What type of data is best visualized using bar plots?

Answer: Bar plots are best for visualizing categorical data where each category is represented by a bar.

4. Stacked Bar Plots

Stacked bar plots show the cumulative values of data. Each bar represents a total with different segments.

import matplotlib.pyplot as plt

categories = ['A', 'B', 'C']
values1 = [4, 7, 1]
values2 = [2, 5, 6]

plt.bar(categories, values1, color='blue', label='Set 1')
plt.bar(categories, values2, bottom=values1, color='orange', label='Set 2')
plt.title('Stacked Bar Plot')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.legend(loc='upper left')
plt.show()
        
Stacked Bar Plot

Question: What is the advantage of using stacked bar plots?

Answer: Stacked bar plots allow us to see the total value and the individual contribution of each component.

5. Bar Plots Side by Side

Side-by-side bar plots compare multiple sets of data side by side.

import matplotlib.pyplot as plt
import numpy as np

categories = ['A', 'B', 'C']
values1 = [5, 7, 2]
values2 = [6, 8, 3]
width = 0.4

x = np.arange(len(categories))

plt.bar(x - width/2, values1, width, label='Set 1')
plt.bar(x + width/2, values2, width, label='Set 2')
plt.xticks(x, categories)
plt.title('Bar Plots Side by Side')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.legend(loc='upper left')
plt.show()
        
Bar Plots Side by Side

Question: When should you use side-by-side bar plots?

Answer: Side-by-side bar plots are useful when comparing multiple sets of data across the same categories.

6. Pie Charts

Pie charts represent data as slices of a pie, showing the relative proportions of different categories.

import matplotlib.pyplot as plt

labels = ['A', 'B', 'C', 'D']
sizes = [15, 30, 45, 10]

plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=140)
plt.title('Pie Chart')
plt.axis('equal')
plt.show()
        
Pie Chart

Question: What information do pie charts convey effectively?

Answer: Pie charts effectively convey the relative proportions of different categories within a whole.

7. Histograms

Histograms are used to represent the distribution of a dataset. They show the frequency of data points within certain ranges.

import matplotlib.pyplot as plt

data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6, 7]

plt.hist(data, bins=6, color='orange', edgecolor='black')
plt.title('Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()
        
Histogram

Question: What do histograms help to visualize?

Answer: Histograms help to visualize the distribution of a dataset and the frequency of data points within specified ranges.

8. Box Plots

Box plots (or whisker plots) show the distribution of a dataset based on five summary statistics: minimum, first quartile, median, third quartile, and maximum.

import matplotlib.pyplot as plt

data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6, 7]

plt.boxplot(data)
plt.title('Box Plot')
plt.ylabel('Value')
plt.show()
        
Box Plot

Question: What are the five summary statistics shown in a box plot?

Answer: The five summary statistics are minimum, first quartile, median, third quartile, and maximum.

9. Pair Plots

Pair plots are used to visualize the pairwise relationships between multiple variables in a dataset. Each scatter plot shows the relationship between two variables, and the distribution of individual variables is shown in the diagonal.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

data = pd.DataFrame({
    'Category': ['A', 'B', 'A', 'B'],
    'Feature1': [1, 4, 3, 5],
    'Feature2': [2, 3, 4, 1],
    'Feature3': [5, 2, 6, 3]
})

# Convert 'Category' to a categorical variable
data['Category'] = data['Category'].astype('category')

# Create a pair plot
sns.pairplot(data, hue='Category', palette='viridis')
plt.suptitle('Pair Plot', y=1.02)
plt.show()
        
Parallel Coordinates Plot

Question: What type of data is best visualized using pair plots?

Answer: Pair plots are best for visualizing relationships in multi-dimensional numerical data. They help identify correlations, patterns, and distributions among pairs of variables.