Photo by Campaign Creators on Unsplash
Visualizing Data with Matplotlib: A Beginner’s Guide - Part 1 of 2
Matplotlib is a powerful python library for creating a wide range of plots for different purposes and data types. With its wide range of plotting functions and customization options, Matplotlib makes creating beautiful and informative visualizations easy.
Data visualization is a powerful tool for exploring, analyzing, and communicating data. By creating visual representations of data, you can easily identify patterns, trends, and relationships that might otherwise be difficult to see.
In part 1 of the series we will explore the basics of using Matplotlib for data visualization, from installing the library to creating your first customized plot. In part 2 we will explore advanced techniques for customizing and enhancing your visualizations
Getting Started with Matplotlib
Before creating visualizations with Matplotlib, we must install the library and import it into our coding environment and choose a plotting style that suits our needs.
Installing Matplotlib
Matplotlib is not included in the standard Python distribution. To install it, we can use the pip
command in the Terminal:
pip install matplotlib
Alternatively, to install Matplotlib using Anaconda, use the conda
command in the Terminal:
conda install matplotlib
Importing Matplotlib
Once we have installed Matplotlib, we can import it into our Python code using the import
statement.
import matplotlib.pyplot as plt
Running the code will allow you to access the main plotting functions and customization options of Matplotlib using the plt
prefix. For example, to create a simple line plot, we can use the plt.plot()
function:
plt.plot([1, 2, 3, 4], [2, 4, 6, 8]) # plot x and y values
plt.show() # show the plot
This will produce a plot like this:
Setting up the Working Environment
You must set up your working environment depending on how you want to use Matplotlib. There are two main ways to use Matplotlib: in scripts or in interactive sessions.
Scripts
If you want to use Matplotlib in scripts, simply write your code in a text editor or an Integrated Development Environment (IDE) and run it from the Terminal or within the IDE. This is useful for creating static plots that do not require user input or interaction.
To use Matplotlib in scripts, we need to add the following line at the beginning of our code:
%matplotlib inline
This will ensure that our plots are displayed inline in the output and not in a separate window.
Interactive Sessions
If we want to use Matplotlib in interactive sessions, we can use tools like Jupyter Notebook to execute code line by line and see the results immediately. This is useful for exploring data and creating dynamic plots responding to user input or interaction.
To use Matplotlib in interactive sessions, we need to add the following line at the beginning of our code:
%matplotlib notebook
This will enable interactive features like zooming, panning, and updating plots.
Choosing a Plotting Style
Matplotlib offers a variety of plotting styles that can change the appearance and aesthetics of your plots. To see the available styles, use the plt.style
.available
attribute:
This will return a list of style names that we can choose from. Some examples are:
‘classic’
‘seaborn’
‘ggplot’
‘dark_background’
‘fivethirtyeight’
To apply a style to our plots, we can use the plt.style
.use()
function with the name of the style as an argument:
plt.style.use('seaborn')
The style changes your plots' colors, fonts, grids, and other elements. You can also combine multiple styles by passing a list of style names as an argument:
plt.style.use(['seaborn', 'dark_background'])
This will apply both styles to our plots. To reset to the default style, we can use the ‘default’ style name:
plt.style.use('default')
Basic Plotting with Matplotlib
A main feature of Matplotlib is the ability to create a wide range of plots for different purposes and data types.
Line Plots
A line plot is a simple and effective way to visualize the relationship between two variables or the change of a variable over time. A line plot consists of a series of points connected by a line. To create a line plot with Matplotlib, use the plt.plot()
function:
plt.plot(x, y) # plot x and y values
plt.show() # show the plot
The plt.plot()
function takes two arguments: the x and y values to be plotted. These can be lists, arrays, or any other iterable objects. The function automatically creates a line plot with the default style and colors. You can pass additional arguments to customize the appearance of the plot, such as the color, marker, linestyle, label, etc. For example:
x, y = [1, 2, 3, 4], [2, 4, 6, 8]
plt.plot(x, y, color='red', marker='o', linestyle='--', label='line 1') # plot x and y values with custom style and label
plt.legend() # show the legend
plt.show() # show the plot
This will create a line plot with red dashed line and circle markers, and add a legend with the label ‘line 1’.
We can also create multiple line plots on the same figure by calling the plt.plot()
function multiple times with different data sets and styles. For example:
a, b = [1,5,7,6], [3,7,8,9]
plt.plot(x,y, color = 'red', marker= 'o', linestyle = '--', label = 'line 1')
plt.plot(a,b, color = 'blue', marker= 'x', linestyle = '--', label = 'line 2')
plt.legend() # show the legend
plt.show() # show the plot
This will create two line plots with different colors and labels on the same figure.
Scatter Plots
A scatter plot is another way to visualize the relationship between two variables or the distribution of a variable. A scatter plot consists of a series of points that represent the values of two variables. To create a scatter plot with Matplotlib, use the plt.scatter()
function:
plt.scatter(x, y) # plot x and y values as points
plt.show() # show the plot
The plt.scatter()
function takes two arguments: the x and y values to be plotted. These can be lists, arrays, or any other iterable objects. The function will automatically create a scatter plot with the default style and colors. We can also pass additional arguments to customize the appearance of the plot, such as the color, marker, size, alpha, edgecolor, etc. For example:
plt.scatter(x, y, color='red', marker='*', alpha=1, edgecolor='blue') # plot x and y values as stars with custom style
plt.show() # show the plot
This will create a scatter plot with red star markers that have blue edges and are opaque.
You can also use different colors or sizes to represent a third variable in your data set. For example, if you have a data set that contains the height, weight, and gender of some people, we can use different colors to represent their gender:
plt.scatter(height, weight, color=gender) # plot height and weight values as points with different colors based on gender
plt.show() # show the plot
This will create a scatter plot where male points are blue and female points are red.
Bar Plots
A bar plot is a useful way to compare the values of different categories or groups. A bar plot consists of a series of bars that represent the values of each category or group. To create a bar plot with Matplotlib, use the plt.bar
()
function:
c = ('Mark', 'Mike', 'Martin')
d = (10,15,20)
plt.bar(c,d) # plot c and d values as bars
plt.show() # show the plot
The plt.bar
()
function takes two arguments: the x and y values to be plotted. The x values are usually strings or categorical variables that represent the names or labels of each category or group. The y values are usually numerical variables representing each bar's heights or lengths. The function automatically creates a bar plot with the default style and colors. We can also pass additional arguments to customize the appearance of the plot, such as the width, color, edgecolor, align, etc. For example:
plt.bar(c, d, width=0.8, color='purple', edgecolor='black', align='center') # plot c and d values as bars with custom style
plt.show() # show the plot
This will create a bar plot with purple bars that have black edges and are centered on the x values.
We can also create horizontal bar plots by using the plt.barh()
function instead of the plt.bar
()
function. This will swap the x and y values and create bars that are horizontal instead of vertical. For example:
plt.barh(c, d, height=0.8, color='pink', edgecolor='black', align='center') # plot c and d values as horizontal bars with custom style
plt.show() # show the plot
This will create a horizontal bar plot with pink bars that have black edges and are centered on the d values.
Histograms
A histogram is a special bar plot type that shows a numerical variable's frequency or distribution. A histogram consists of a series of bins that represent the intervals or ranges of the variable, and the height of each bin represents the number or proportion of observations that fall within that bin. To create a histogram with Matplotlib, we can use the plt.hist()
function:
plt.hist(x) # plot x values as a histogram
plt.show() # show the plot
The plt.hist()
function takes one argument: the x values to be plotted. The x values are usually a numerical variable that we want to analyze. The function will automatically create a histogram with the default style and colors. We can also pass additional arguments to customize the appearance of the plot, such as the number of bins, range, density, color, edgecolor, etc. For example:
plt.hist(x, bins=20, range=(0, 100), density=True, color='yellow', edgecolor='black') # plot x values as a histogram with custom style
plt.show() # show the plot
This will create a histogram with 20 bins that range from 0 to 100, and show the relative frequency or density of each bin instead of the absolute count. The bins will be yellow with black edges.
Pie Charts
A pie chart is another way to compare the values of different categories or groups. A pie chart consists of a circle that is divided into slices that represent the values or proportions of each category or group. To create a pie chart with Matplotlib, we can use the plt.pie()
function:
x = 5,10,12
plt.pie(x) # plot x values as a pie chart
plt.show() # show the plot
The plt.pie()
function takes one argument: the x values to be plotted. The x values are usually numerical variables that represent the values or proportions of each category or group. The function automatically creates a pie chart with the default style and colors. We can also pass additional arguments on customizing the appearance of the plot, such as the labels, colors, explode, autopct, shadow, startangle, etc. For example:
plt.pie(x, labels=['Mike', 'Martin', 'Mark'], colors=['red', 'green', 'blue', 'orange'],explode=[0.1, 0, 0], autopct='%1.1f%%',shadow=True, startangle=90) # plot x values as a pie chart with custom style
plt.show() # show the plot
This will create a pie chart with labels Mike, Martin, and Mark for each slice, and different colors for each slice. The slice for Mike will be slightly separated from the rest of the pie (explode), and the percentage of each slice will be shown on the chart (autopct). The pie chart will also have a shadow effect (shadow), and start from 90 degrees angle (startangle).
Conclusion
In this article, we have introduced you to the basics of using Matplotlib for data visualization. We have covered everything from installing the library and creating your first plot. In part two of the series we will explore advanced techniques for customizing and enhancing your visualizations. I hope this guide has given you the tools and knowledge you need to visualize your data with Matplotlib effectively.
Thank you for reading this article. I hope you enjoyed it and learned something new. Happy plotting! 😊
Stay tuned for part 2.