Matplotlib provides a powerful way to create histograms, which are used to visualize the distribution of numerical data.

What is a Histogram?

A histogram displays the frequency distribution of a dataset. The vertical axis represents the frequency, while the horizontal axis represents the data range divided into bins (intervals). Each bin has a minimum and maximum value, and the height of the bar represents how many values fall within that range.

Creating a Histogram in Matplotlib

Below is an example of how to create a histogram using Matplotlib:

Example: Basic Histogram with Fit Line

import numpy as np
import matplotlib.pyplot as plt

# Set seed for reproducibility
np.random.seed(10**7)

# Define mean and standard deviation
mu = 121   # Mean
sigma = 21 # Standard deviation

# Generate random data
x = mu + sigma * np.random.randn(1000)

# Define number of bins
num_bins = 100

# Create histogram
n, bins, patches = plt.hist(x, num_bins, density=True, color='red', alpha=0.7)

# Compute the probability density function (PDF)
y = ((1 / (np.sqrt(2 * np.pi) * sigma)) * np.exp(-0.5 * ((bins - mu) / sigma) ** 2))

# Plot the fitted line
plt.plot(bins, y, '--', color='black')

# Add labels and title
plt.xlabel('X-Axis')
plt.ylabel('Y-Axis')
plt.title('Matplotlib Histogram Example', fontweight="bold")

# Show plot
plt.show()

Explanation of the Code:

  • We use np.random.seed(10**7) to ensure reproducibility of results.
  • The mean (mu) and standard deviation (sigma) define the distribution of our dataset.
  • np.random.randn(1000) generates 1000 random values following a normal distribution.
  • The plt.hist() function creates the histogram:
    • num_bins=100 sets the number of bins.
    • density=True normalizes the histogram.
    • color='red' and alpha=0.7 set the color and transparency.
  • A Gaussian distribution (normal curve) is plotted over the histogram using plt.plot().
  • Finally, we add axis labels and a title for clarity.

Conclusion

Histograms are a fundamental tool for analyzing the distribution of data. Using Matplotlib, you can easily customize them by adjusting bin sizes, colors, and adding statistical elements like fit lines.