This tutorial covers the basics of plotting with Matplotlib in Python. We will cover the following types of plots:
Scatter plots are used to observe the relationship between variables. Each point represents an observation.
import matplotlib.pyplot as plt x = [1, 2, 3, 4, 5] y = [5, 7, 4, 6, 5] plt.scatter(x, y, color='magenta', marker='o', label='Data points') plt.title('Scatter Plot') plt.xlabel('x') plt.ylabel('y') plt.legend(loc='upper right') plt.show()
Question: What does each point in a scatter plot represent?
Answer: Each point in a scatter plot represents an observation, showing the relationship between the x and y variables.
Line plots are used to represent data points connected by straight lines. They are useful for showing trends over time.
import matplotlib.pyplot as plt x = [1, 2, 3, 4, 5] y = [2, 3, 5, 7, 11] plt.plot(x, y, marker='o', linestyle='-', color='green', label='Line') plt.title('Line Plot') plt.xlabel('x') plt.ylabel('y') plt.legend(loc='upper left') plt.show()
Question: When are line plots particularly useful?
Answer: Line plots are particularly useful for showing trends over time or sequential data.
Bar plots are used to represent data with rectangular bars. The length of each bar is proportional to the value it represents.
import matplotlib.pyplot as plt x = ['A', 'B', 'C', 'D'] y = [3, 7, 5, 9] plt.bar(x, y, color='purple', label='Values') plt.title('Bar Plot') plt.xlabel('Categories') plt.ylabel('Values') plt.legend(loc='upper left') plt.show()
Question: What type of data is best visualized using bar plots?
Answer: Bar plots are best for visualizing categorical data where each category is represented by a bar.
Stacked bar plots show the cumulative values of data. Each bar represents a total with different segments.
import matplotlib.pyplot as plt categories = ['A', 'B', 'C'] values1 = [4, 7, 1] values2 = [2, 5, 6] plt.bar(categories, values1, color='blue', label='Set 1') plt.bar(categories, values2, bottom=values1, color='orange', label='Set 2') plt.title('Stacked Bar Plot') plt.xlabel('Categories') plt.ylabel('Values') plt.legend(loc='upper left') plt.show()
Question: What is the advantage of using stacked bar plots?
Answer: Stacked bar plots allow us to see the total value and the individual contribution of each component.
Side-by-side bar plots compare multiple sets of data side by side.
import matplotlib.pyplot as plt import numpy as np categories = ['A', 'B', 'C'] values1 = [5, 7, 2] values2 = [6, 8, 3] width = 0.4 x = np.arange(len(categories)) plt.bar(x - width/2, values1, width, label='Set 1') plt.bar(x + width/2, values2, width, label='Set 2') plt.xticks(x, categories) plt.title('Bar Plots Side by Side') plt.xlabel('Categories') plt.ylabel('Values') plt.legend(loc='upper left') plt.show()
Question: When should you use side-by-side bar plots?
Answer: Side-by-side bar plots are useful when comparing multiple sets of data across the same categories.
Pie charts represent data as slices of a pie, showing the relative proportions of different categories.
import matplotlib.pyplot as plt labels = ['A', 'B', 'C', 'D'] sizes = [15, 30, 45, 10] plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=140) plt.title('Pie Chart') plt.axis('equal') plt.show()
Question: What information do pie charts convey effectively?
Answer: Pie charts effectively convey the relative proportions of different categories within a whole.
Histograms are used to represent the distribution of a dataset. They show the frequency of data points within certain ranges.
import matplotlib.pyplot as plt data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6, 7] plt.hist(data, bins=6, color='orange', edgecolor='black') plt.title('Histogram') plt.xlabel('Value') plt.ylabel('Frequency') plt.show()
Question: What do histograms help to visualize?
Answer: Histograms help to visualize the distribution of a dataset and the frequency of data points within specified ranges.
Box plots (or whisker plots) show the distribution of a dataset based on five summary statistics: minimum, first quartile, median, third quartile, and maximum.
import matplotlib.pyplot as plt data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6, 7] plt.boxplot(data) plt.title('Box Plot') plt.ylabel('Value') plt.show()
Question: What are the five summary statistics shown in a box plot?
Answer: The five summary statistics are minimum, first quartile, median, third quartile, and maximum.
Pair plots are used to visualize the pairwise relationships between multiple variables in a dataset. Each scatter plot shows the relationship between two variables, and the distribution of individual variables is shown in the diagonal.
import matplotlib.pyplot as plt import seaborn as sns import pandas as pd data = pd.DataFrame({ 'Category': ['A', 'B', 'A', 'B'], 'Feature1': [1, 4, 3, 5], 'Feature2': [2, 3, 4, 1], 'Feature3': [5, 2, 6, 3] }) # Convert 'Category' to a categorical variable data['Category'] = data['Category'].astype('category') # Create a pair plot sns.pairplot(data, hue='Category', palette='viridis') plt.suptitle('Pair Plot', y=1.02) plt.show()
Question: What type of data is best visualized using pair plots?
Answer: Pair plots are best for visualizing relationships in multi-dimensional numerical data. They help identify correlations, patterns, and distributions among pairs of variables.