Use the groupby function in pandas to analyze specific items from a large dataset, such as determining the total quantity and revenue of Chipotle's chicken bowls. Gain practical insight into how grouping and summing data by item name can reveal useful sales metrics for visualization and dashboarding.
Key Insights
- Use the groupby function in pandas to consolidate over 4,000 rows of Chipotle order data by item name, allowing for summary statistics like total quantity or revenue.
- To find the total number of chicken bowls sold, group the dataset by 'item_name' and sum the 'quantity' column, then use .loc['Chicken Bowl'] to isolate the result, which in this case is 761 units.
- Noble Desktop demonstrates a similar process to calculate total revenue by summing the 'item_price_as_number' column, showing that the chicken bowl generates the highest revenue among all items.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
If you're not familiar with group by, that's how we're going to find out some answers about the chicken bowl, what we all want answers on. We have, you know, 4,000 plus rows of data, but we can group them by their item name. We can say, hey, I want all the chicken salad bowls collapsed, grouped into one row, one row with information about the chicken salad bowl.
Or in our case, what we actually want is the chicken bowl, no salad, please. So here's how we're going to do that. We're going to say, items order.
That seems like a fine name for it. These are going to be grouped by all the items that were ordered. We're going to say Chipotle orders dot group by.
When we pass the group by is the column we want to group by. Item name, this column right here. Okay.
Then we also want to get a specific column in there. Item price as number. And we will sum it up.
Actually, we don't want item price as number. That's total revenue. We'll do that in a moment.
We want quantity. And let's check out items ordered. All right, it is a series, as you can see here.
And each of those items, actually, I think it's, yeah, it is a series. And what we want here is to look at it as a column, or rather a row, set of rows, where we can use dot loc to say, give me the row named chicken bowl. Let's try that.
Dot loc, chicken bowl. And there we go. We got the number that was next to chicken bowl, which is the number 761.
That is the total number of times in this data that quantity adding up, summing up all the quantity values. Okay. Let's get total revenue.
It's going to be almost the same, but we're going to be focused on item price as number. We could say, the revenues are the Chipotle orders grouped by item name. And we want to get the item price as number column, summed up.
And there we go. Now, instead of having quantity, we have price. You can see the chicken bowl has quite a lot of total revenue.
I believe it's the biggest one here. We'll find out as we explore this data more, but if we want a specific one, we can again call dot loc and pass it chicken bowl. Yup.
That was a number. Okay. In our next bit, we will start to get this into a proper into our data, into a proper format for graphing, and then we'll graph it.
And then we'll go local and start to get back into a dashboard and some impressive visualizations.