Mean, Median, and Mode in Pandas for Resale Value Analysis

Use pandas methods mean, median, and mode to summarize numerical data and determine typical values.

Learn how to effectively use pandas to calculate mean, median, and mode for analyzing your dataset's resale values. Gain clarity on when each statistical measure offers the most insightful perspective for your data analysis.

Key Insights

  • The pandas mean method calculates the mathematical average, useful for evaluating the entire data set including outliers, providing an overall view rather than a typical value.
  • Median, calculated using pandas' median method, identifies a central, representative value that best reflects typical data points without being skewed by extreme outliers.
  • Mode, determined by pandas' mode method, shows the most frequently occurring values; however, it can be less meaningful for datasets such as resale values where repeated points are uncommon.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

We're going to take a look now at how we can use pandas to get mean, median, and mode for a column of data, a pandas series, in this case resale value. So let's make some variables. I'm going to make set resale mean equal to cars at your resale value dot mean.

And that's a pandas method on a series that tells you what's the, it has to be on a numerical series, on what is the mathematical mean average. Let's do the same for median, which works the same. And for mode, which of course works a little different, and one aspect that is different, is that mode does not have to be mathematical, because it's just looking at what values show up the most often.

Let's print those out. And let's also take a look at what's the type of mode. There we go.

Let's execute that. So the mode is a series itself, which means that when we're looking at this, we're getting back a column of data. And that sort of is reflected here, the way we're printing it out.

If we actually make it, we'll see it a little better as a series if we just output the value instead of trying to print it. Here's the value. You can see it, it looks like a pandas series.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

The column name, index value, index value, and these four values, zero to three, are the modes that tied for the most common values in this column. We'll take a look now and judge which of these is helpful, which of these answers the question, what questions do they answer? The way I'm going to do that is I'm going to take a look here at a random dataset here, a random set of year resale values. The way I'm going to do that is I'm going to use the built-in pandas method sample, which will give you a random value if you pass a no argument, or a certain number of random values from the dataset.

If we look at these ten, we can see that the median is pretty close to answering the question here. Ignoring all the NENs, we'll return to what those are. But all of these five values are pretty close to the median.

The median is answering the question, what's your typical value look like? What is the approximate typical value? What do they all center around? The mean is more taking outliers into effect. Median is ignoring those. And mode is what is our typical value, or what values are appearing the most in our data.

And this data is not particularly useful for that, because the mode is a poor measure for data where values don't repeat very often. These only appear a couple of times each. And it's more of a backwards-looking, hey, what is showing up a lot in our data? Let's take a look at another sample of ten, and you'll get that idea.

Only one of these, again, is above the mean, because the mean is taking into account quite a few of these random, sorry, quite a few of these outlier values. And we still haven't even seen, I don't believe, any of the mode values. Any of those in there? No, we'll see them eventually.

We can see a couple values above the mean now, and here's where we're starting to see a little bit of an outlier well above the mean. And we'll see some more of those as we take a look at some more just random values just to get a sample. I don't believe I've seen a single one of these mode values, because again, it's just not, it's answering a question we wouldn't actually ask of this data.

There's plenty of times that it is the right question to ask. What value, what particular value or values are showing up quite a lot in our data? And here's one of those outliers I talked about that is skewing the mean quite a lot, but again, if you're looking to take into account outliers and figure out, you know, what is the mathematical middle, this is as opposed to like the more typical middle without outliers, then, you know, the mean is what you want. And here's another one that's an outlier, just not quite as extreme.

And here at last is one of the mode values. There might have been one that I missed in these random samples. So those are the kind of questions that mean, mode, and median ask.

And we're looking at some actual values and, you know, we kind of have to ask which is the right question to ask of this. Mode doesn't seem very helpful for this particular set of data. Median would be helpful for what's our typical value look like in general.

And mean is valuable when you're looking, trying to look at the entire dataset as a whole, including the outliers, trying to take those into account.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

More articles by Colin Jaffe

How to Learn Machine Learning

Master machine learning with hands-on training. Use Python to make, modify, and test your own machine learning models.

Yelp Facebook LinkedIn YouTube Twitter Instagram