Bar Charts - Visualizing Data with Python

Create a horizontal bar chart by setting a DataFrame column as the index and plotting it directly.

Explore the process of creating horizontal bar charts in Python to visualize the populations of the world's most populous countries. Learn the key steps, including manipulating data indices and formatting chart labels effectively.

Key Insights

  • Understand the necessity of having a numeric and string column when creating bar charts in Python, as demonstrated with the example of the top 10 most populous countries in 2020.
  • Learn two distinct methods for generating horizontal bar charts: using matplotlib (plt.barh method) and calling the plot method directly from a DataFrame, noting that the latter requires setting the country names as the DataFrame index.
  • Recognize the importance of adjusting axis limits, formatting large numerical labels with commas, and offsetting these labels appropriately for readability in charts depicting large numbers like populations exceeding 1 billion.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

We could run the bar chart and then let you see the numbers that might illustrate it better. Why don't we do that? Let's do that. We're going to come down here and let's extract the population and the countries so that we have these two columns, right, country and population, and then those are going to be the bars.

Remember we had the parental ED like bachelor degree and some college and so on with the number, the counts, the value counts for the, like how many of the thousand students were in that category. So in this case, we have the same idea, a string and a number. That's what you need for bars.

We're going to have 10 bars, the lengths of which will be set by the population. So China is going to have the longest bar and so on. So what we're going to do is we're, what we did with that parental EDU thing is we moved the EDU label categories, like some college, bachelor's degree, master's degree.

We moved them all over to become the indices because it's the indices that are used to label the bars. But we did, and we did that right away. So you never got to see what happens if you don't move your bar labels over to the index.

What happens is it uses the index numbers as the bar label. So let's look at how that plays out. We're, and there are two different ways we can do this bar.

Python for Data Science Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

Now we're going to start with the way we did it with the parental EDU counts. Way one, just like students DF with the parental EDU counts, which just to remind you, let's look at that again. We're basically recreating the parental EDU bar chart with the horizontal label bars with the numbers after them, except we're doing it with the 10 countries that we're going to make this with the 10 countries, China, et cetera.

And in that example, we had moved the EDUs to the index. They weren't columns anymore. That's what group by did for us anyway.

And then those indices came out as the bar labels. So what we're going to do is we're not going to, we're going to just have number indices and see how that affects the bar labeling. It'll mean that the bar labels will be numbers.

So just like with students DF and parental EDF, we make, we need lists of X and Y, X will be countries and Y will be population. We'll say countries equals, we're going to use our pop 2020, our 10 pop top 10, pop top 10.country. There's your countries. It's a series though.

We'd like to listify that you could actually leave it a series, but looks nicer, listified. Okay, here we go. And then we're going to get our populations.

And there's our 10 populations for our 10 countries. That's all we need to be able to do this. Just like all we needed to do our parental EDU was have the counts and the, and the parental EDU categories, right? Well, in the thing with the countries, the counts of the population and the categories are the countries, a string and a number, same deal.

Now we're going to say PLT dot bar H countries, populations. There you go. Now, if you wanted China at the top, Oh, it figured out to use.

Oh, sure. It uses the countries. Of course.

Right, right, right. We said to use the countries. Fantastic.

Okay. We're good. Now, if we wanted to reverse this, we could say countries dot reverse populations dot reverse.

And the reason we might want to reverse is just have the big numbers at the top. But as we saw in the last example, if we keep rerunning this though, this needs to be its own box because we don't want to have to rerun, right? Every time we rerun this, we're going to be reversing. It's like a toggle switch, right? We don't want to keep reversing the direction.

So make that once in its own cell. And there we go. China, India up at the top.

So what we want to do now is label it. We'll say PLT dot title. It'll be your top 10 most populous countries for 2020.

And we're going to do PLT dot show. Show. Okay.

Clean up that scrap. And we don't really need, we don't need to label the Y axis. Those are obviously countries, says so in the title, but we could certainly label the X axis.

So why don't we do that? We're going to say PLT dot X label, and that'll be pop in billions. And we should widen the bar. And the reason we need to widen the bar is if we want to display the population numbers next to bar, those numbers are big.

They're going to need a lot of room. We'll see PLT dot X limit is going to equal zero to actually like 2 billion thousand million billion. There you go.

I know that looks really wide, but these numbers are big, right? It's over a billion. You're going to put like a 10 digit number here, right here. We do need the space.

When we're offsetting, not by five, like we did with the EDUs, we're offsetting like, remember here, we offset a little bit so that the numbers didn't touch the bars. When we do that loop, offsetting by five is nothing though. We have to offset by millions to move it over given the scale of these numbers.

Alrighty. Which brings us to the label counts. We're going to loop the bars, actually the population list.

I, val, in enumerate. Remember enumerate is for looping to get access to not just the list items, but the indices of each. So there's your populations list we're iterating over.

And every time, we'll just call this the pop. Every time we, no, don't call it pop, pop's a method, right? So let's not do that. Okay.

So what we want to do every time is we're going to make text. We're going to say plt.text. The text wants to know an X and a Y and a text, right? You got to feed in an X position, a Y position, and a text. What, where do you want the text in XY space? And where do you, what do you want it to say? So plt.text, the X position is going to be the value of the population, the number, plus like 25 million, because it's such a scale.

And then the Y value is going to be I. And we'd like the value to have a comma. We're going to format the value. So the text then, which is a third argument, is going to be F formatting.

The variable will be a value with a colon and a comma. That will string format it with a comma in the number. And vertical align center, we did that last time, and we're going to knock the font size down.

As well, there you go. I mean, we can maybe make it 1.8, but, yeah, yeah. And make the font smaller and make the offset not as big.

There you go. Now we can do that. Maybe an eight, maybe an 18.

Okay, we're good. Nice. Like we did last time.

We're going to keep rolling. The second way to plot the bar is you can actually call on the DF itself, as opposed to plt. And that would be a little bit different method.

We call the plot method on the column, on the DF's column. So the column that we're plotting is population, right? We're going to say pop top 10 DF population. The entire column is going to be plotted with a kind.

It's all different stuff, right? Different syntax. Horizontal bar. It works, but it's very different syntactically.

And now it does matter. Now your indices do matter. Notice.

If we want to back, see, that's what I was talking about before. It only shows up on this version, though. So you've got all these numbers.

You don't know what they mean. Now we should probably reverse the, sort the population, though. We'll say pop top 20, right? We want the big numbers at the top.

We're going to have to reverse the sort. Sort values. Even though it's already the big one at the top, it still needs to be reversed by population.

And there you go. So, but zero is China and 18 is India. So where are the names of the countries? They're showing up here, but they're not showing up in this version where we're calling it on the data frame.

Because it doesn't know where to look because it's the whole data frame. So like you're calling it on population to get the numbers, but it doesn't know anything about the countries. We'll go back and fix that in a bit.

Let's, these are, these labels are the same. There you go. Popping billions.

You can also expand our X limit the same way. And we can populate the bars with numbers the same way as we did before. Okay.

Now we're, and it gave you a, and it gives us a, this, this version also gives us a key, a legend automatically, which we don't necessarily need, but it does. So what we're down to now is, okay, how do we get the countries to show up? Well, in this version, the countries have to be the index of the dataset. We're going to go up to, that's what I was saying before.

We need to move the countries over. We can do that here, like where we need to do it. We'll say, here, we can do it down, do it down here.

Okay. Move countries over from its own column to be the indices. This is required for the next way of making horizontal bar chart.

Otherwise the bar labels just show as index nums, China equals zero, India equals 18, et cetera, which we don't want. So let's process that. We're going to take, basically what we're going to do now is take this country column and say, go over and be here.

And then you'll be down to one column at that point. And we'll make a new DF for that. We'll call it pop top 10, one call 2020 DF equals pop top 10 DF.

Let's just straight up cut. Well, index. We just want to move it over.

Right. And in this one, we're going to say set index. So the set index method takes a column that you want to move over.

You want to take the country column and say, okay, move over. You're now being set as the index. So you call set index on a data frame and pass it the column that you would like to become the new index DF set index.

Call name moves the call name over to become the indices to replace. Right. We're making the new DF, which is going to be whatever this operation returns.

And that should work. Let's see. Print the shape.

Believe it or not, we're down to one column now. That should be 10, one. Yep.

See, there's only one actual column. This other so-called column is no column at all. The country looks kind of like a column, but you can see it's lower down.

It's not a column and there's no numbers anymore. That zero to nine, they've been replaced. Now if you use this, as your data source, now you got the labels back is feeding off the indices, which are now the country names.

Tricky, nifty, little bit of hustle bustle. That's for sure. You're going to have to study.

You're going to really want to have, I mean, if you're listening and first of all, just knowing you can make you that you're not going to walk around with this in your head all the time. Right. And then, you know, six months, every six months or so you need to make a, a horizontal chart.

No, you know, it can, it can be done. You know, it exists. If you have the vaguest inkling that there's two different ways to do it, you're way ahead of the game.

And then you Google this or better yet, look at your files. Your files are like your little personal go-to code stash. Right.

Your little cheat sheet of 10 files where you can go find stuff, look it up.

Brian McClain

Brian is an experienced instructor, curriculum developer, and professional web developer, who in recent years has served as Director for a coding bootcamp in New York. Brian joined Noble Desktop in 2022 and is a lead instructor for HTML & CSS, JavaScript, and Python for Data Science. He also developed Noble's cutting-edge Python for AI course. Prior to that, he taught Python Data Science and Machine Learning as an Adjunct Professor of Computer Science at Westchester County College.

More articles by Brian McClain

How to Learn Python

Master Python with hands-on training. Python is a popular object-oriented programming language used for data science, machine learning, and web development. 

Yelp Facebook LinkedIn YouTube Twitter Instagram