Bar Charts and Data Sorting with Matplotlib

Create horizontal bar charts from grouped pandas DataFrames; annotate bars using loops and enumerate in Matplotlib.

Dive into data visualization using Matplotlib and learn how to create clear, readable bar charts. Discover practical looping techniques in Python to effectively label and enhance your visuals.

Key Insights

  • Utilize Matplotlib's barh function to create horizontal bar charts, making category labels easier to read compared to vertical bars.
  • Implement enumerate in Python loops to access both index positions and items within a list, enabling precise labeling of data visualization elements.
  • Enhance readability and aesthetics of charts by adjusting properties such as color, bar order, text alignment, and axis limits in Matplotlib visualizations.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

Okay, time for a little data visualization with Matplotlib. Charts display data in an X-Y coordinate system. You have bar charts consisting of side-by-side vertical bars or stacked horizontal bars.

You can get the bars to run sideways or straight up. The y-axis is typically for your numeric values, and your X-axis is for categories or, in the case of a line chart, for a time series showing the progression of time. So what we're going to do is make a bar chart from the students DataFrame that has been grouped—just bars.

We'll see how the length of the bars represents how many items are in each category. We're going to use plt.bar. We are going to feed in X- and y-values to make vertical bars, and barh; we're going to feed in X- and y-values to make horizontal bars. Now, it would be better to make horizontal bars because bars have labels, and it's hard to read these longer labels if they're beneath vertical bars, right? But if they're sideways, they read just like this; you would actually have the names exactly as you see them here.

Instead of numbers, or in addition to numbers, you just have a bar—nice and long—with a number after it, perhaps. Okay, so we're going to get the count column into a list. That is going to be our y-values.

We'll say count_counts_list equals edu_group_df['count'] and listify it because it would be a Series otherwise, which is okay—we can work with a Series. There, there are your values.

Python for Data Science Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

All right, what do we want to do with them? Well, those will supply the data for the chart to set the bar sizes. Now we also need, for the chart, the names of these index values.

They're not columns; they're the index. We want to get a list of these seven. We're going to see edu_levels_list.

We're going to listify—what are we going to listify here? We're going to listify not a column but the index, right? Because these values are not a column; they're the index. They could be zero through five or six.

Let's just see what we get. There you go, it works. Now we've got our two sets of data—the labels of the bars and the numbers to set the length of the bars.

Okay, so we need to learn a little move. We know a bit about looping, right? So I'm going to show you something on how to do a loop. We have these categories, right? Let's say edu_levels_list.

Let's say for edu_level—we'll just say edu. In edu—let's just call it edu_list. What's up? edu_list not defined.

Why, who says? Oh, sure, edu_group. Okay, so we're going to loop this edu_list. For edu in edu_list, let's just print the edu every time.

All right, there you go. So why do we care? Later, we're going to need to loop our bars to label them with numbers. But what if you wanted to print the index of each item with it? So you want like zero, “Some college, ” one, “Associate’s degree, ” and so on.

Well, this loop doesn't have access to the index of the items—but it could. We could say we want the index also, but now we have to wrap the list itself in the enumerate method, which unlocks the index as well as the item. A regular for loop just gives access to the item, right? When you have a regular for loop, all you have access to is the item, but we want the index as well.

Why? You'll see why later, but for now, just theoretically, what if we want the index, okay? Now we could print the index and the item. That's enumerate for you; it unlocks the index, and you have to list them out like so, with the index first.

So for index, item in enumerate(some_list). The syntax is for index, item in enumerate(some_list). So let's try that with fruits.

We'll say fruits—just one more example to make sure we get this concept. We're going to print the fruits number. We'll say for index, fruit in enumerate(fruits): print(index, fruit).

So what up? Too many values. Oh, right, too many values, because I didn't use enumerate. You’ve got to have enumerate if you want two values.

There we go. And let's say you wanted to start numbering from one—you can do index + 1. Okay, so challenge: this is hard.

Make smoothies of pairs of consecutive fruits. This requires a current index plus the next item. So use this enumerate where you're grabbing the index, and you want to make pairs of consecutive fruits such as “apple-banana smoothie.”

Just another bit of practice to see if we can work with the index while we're looping as well as the item. And we need that for what we want to do with our bar chart that we're making. Pause; come back when you're ready.

Okay, here we go. We're going to print—let's actually throw the results into their own list called smoothies.

We're going to loop. Let's print smoothies when we're all done. We'll pprint it.

I don't think we've brought in pprint. Did we bring in pprint? No, let's do it. We'll say import pprint as pp.

All right. We're making smoothies—in other words, hyphenated consecutive fruits.

Consecutive fruits. So it would be like this: apple-banana. We're not going to print.

We're going to take smoothies.append, and we can append using that—we haven't used this in a little while—the string formatting with the f. We'll say we want the current fruit-fruits[i], another way of saying fruit.

Run, and there you go. But it made self-pairs. We want to make the next item.

We're going to say fruits[i + 1], and then it throws an error because the list is out of range, because when you get to the last fruit, you don't have another item afterward.

You can't do an i + 1 when you get to the end. So what we're going to do is say if i < len(fruits)—1: we don't try to make a smoothie with a non-existent fruit off the edge.

Orange-peach, right? Peach being the last one. So there's a condition—you don't append if you're already at peach.

You make your last smoothie when you're at orange. Okay, so now that we have a sense of using enumerate to unlock the index, let's get on to making a bar chart. We're going to say color.

Actually, you know what? We just want one color. We're going to use DodgerBlue as our color. You don't make different colors when it's the same kind of data.

So I was playing around with that. plt.barh—we want horizontal bars, as we saw, right? barh. We're going to feed in the X-values and the y-values. So the X-values, of course, are going to be the categories.

The edu_list, and we want to have the counts as the y-values. Typically, your y-value is your counts—be it sales or stock prices or whatever.

Run. And there's your bar chart—look at that. You want the big bar at the top; we can flip that.

You can reverse these lists. We can say edu_list.reverse(). counts_list.reverse(). Run that.

And now we're going the way we want, okay. Now, we did that little sidebar move. It's a little side exercise with enumerate because we didn't get to it in lesson four when we first looked at loops, and we need it now.

I tried to teach as much stuff as we could so that when we get into the data science we’d be well-equipped, but there's inevitably extra stuff that comes up. Here we saw enumerate. Now, why do we need enumerate? Why would we need the index to loop? We're going to use the index to loop the counts, the numbers that represent the bar sizes.

And as we go, we're going to output the number next to the bar. We're going to say for i, count in enumerate(counts_list): let's print i, count. There you go; there are your counts by index.

That's not really what we want to do. What we want to do is say, okay, plt.text. We're going to label the bars with text. And plt.text takes an X- and a y-position and then a value of text.

So you go into your X- and y-coordinate system—the X being the bottom and the y being vertical—and you drop in X, y like a point on the chart, then set text at that spot.

So what would our X-position be for every count? It would be the count. We're going to say plt.text. Where do we want to lay text down? At the current count so that it's next to the bar.

And where do we want to lay it down? On the y-axis, we want to lay it down on the index from zero to six. And what do we want to say? We want to output the count. Run. There you go.

Now there's a little breathing room here. We're going to say count + 5 to get the data—the values, the labels—off. That's fine.

The + 5 moved the labels off the bars, but we need to widen the bars. We're going to use something called the X-limit. We're going to set the X-lim, the horizontal limit, to be wider than the default.

Because by default, these charts aren't any bigger than they need to be to fit their data. So the X-lim will go from zero to, instead of 225 or something, we'll go to 250. We start at zero; still, we're going to 250.

We're widening the X. And why is it—oh, because I'm reversing every time, right? Remember, we're reversing. We should move these out of here. Yeah, I was flipping every time because we keep reversing.

Let's just reverse the one time. There we go. Now we need to have—we should have—let's set the color.

We'll say , color='DodgerBlue'. It's kind of like the current blue, a little different. There you go.

And let's do plt.title. Charts should have a title. And the y-axis should be the number of students; excuse me, the X-axis, and it's the X-label.

There you go: “Number of Students.” We don't need to put anything on the left side—that's pretty obvious, especially when it says it in the title.

Like, we don't need to label the y-axis. It's already all labels. We can change the color of the title and the X-label.

Why don't we do that also? There you go. “Number of Students, ” title. And there are also ticks.

These labels here on the left—the edu labels—those are actually called the ticks, and they're the y-ticks. So let's say plt.yticks(color='coral').

There you go. I don't know if that's a great color or anything, but—or we could say DodgerBlue for that one. Even that's colors though, right? Or we could say gray.

Yeah, let's do a #555 or something. There you go. Maybe a #237; let's see what that looks like. Okay, and when you're all done—okay, let's not print out the counts here.

And plt.show() is kind of the last thing you want to do. It means that you're done with this particular plot. And if you label the y, it'll say parental education category, but we don't really need to see that, right? So let's just turn that off.

That's just to show you could label the y. And that's pretty much the end of that. This is what you want to camp out on and work on and do. This is our very first visualization.

It takes a while—you've got to learn your core programming, then you’ve got to learn NumPy and Pandas. Notice I just keep hitting you with more and more and more and more stuff, and now we finally can get into visualization.

We can go on and just go on forever with this stuff. Let's—yeah, there it is. Notice the text a little bit up, a little high, writing a bit high.

We could add another argument. We could say va='center' (vertical alignment, center). And let's watch the text move down.

There it is; it's centered vertically now. All right, last move, then we're done: sorting by secondary and tertiary categories.

So let's sort the math_score and then go on to sort the reading_score. So let's say we want to sort all the students. We'll say students_df.sort_values(by='average', ascending=False).

We want to see the heavy hitters at the top. We'll go in descending order. And we just want to see the top 40—slice that off.

We don't even need a new DataFrame; let's just run it. Okay, so there's your average. You've got a few people with a 100 average.

Now, after that, the question is, do you want a secondary score—a secondary sort? We’ve got two people tied with 99. Would you like to rank them? Because right now we don't really know how they're ranked. They're ranked—we just don't even know, right? They have a tie score.

Here are a couple more with the tie score. Here are three in a row with a tie score. Why is one ranked above the other? Is it alphabetized? What is it? Now we could say we prioritize the math score, and we could say, okay, rank by average.

And then if there's a tie, rank by math. So you could feed in a second sort-field column. So you'll start sorting by average, but in the event of a tie, you'll use math as a tiebreaker.

So watch the order change a little bit. There you go—the highest math. So in the event of a tie—here's your tie with your 97.67, the three-way tie.

If you take out math and look at that three-way tie—97.67—the math is actually at the bottom. We're not prioritizing the math. Prioritize the math.

And there you go. Beyond that, you probably wouldn't want to sort on another level of three-way tie. So just secondary sorting, really.

All right, that's the end of this lesson. Glad you stuck with it. Hope you enjoyed it.

Thank you very much. We are done with lesson eight: Pandas, Matplotlib, and CSV.

Brian McClain

Brian is an experienced instructor, curriculum developer, and professional web developer, who in recent years has served as Director for a coding bootcamp in New York. Brian joined Noble Desktop in 2022 and is a lead instructor for HTML & CSS, JavaScript, and Python for Data Science. He also developed Noble's cutting-edge Python for AI course. Prior to that, he taught Python Data Science and Machine Learning as an Adjunct Professor of Computer Science at Westchester County College.

More articles by Brian McClain

How to Learn Python

Master Python with hands-on training. Python is a popular object-oriented programming language used for data science, machine learning, and web development. 

Yelp Facebook LinkedIn YouTube Twitter Instagram