Skewness and Kurtosis in Descriptive Statistics

Explain skewness, kurtosis, and descriptive statistics using Excel's Analysis Toolpak.

Learn concepts in descriptive statistics, including skewness, kurtosis, and standard error, and learn how to leverage Excel's analysis tool pack for insightful data interpretation. Enhance your analytical skills by understanding how these measures impact data analysis and decision-making.

Key Insights

  • Understand how skewness and kurtosis assess the shape and abnormalities of data distributions, with skewness indicating the direction of a distribution's tail and kurtosis measuring the peakedness or flatness of the curve.
  • Activate Excel's analysis tool pack add-in to effortlessly generate essential statistical measures including mean, median, mode, standard deviation, sample variance, range, and standard error.
  • Recognize that standard error reflects statistical accuracy—smaller standard errors indicate larger, more reliable sample sizes and contribute to greater confidence in data-driven conclusions.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

Descriptive statistics. Our first topic for descriptive statistics is related to something called skewness and also something called kurtosis. This relates to the bell curve that we have been talking about in dispersion in a normal curve, but in this case we're talking about an abnormality in that curve.

Skewness relates to whether or not the curve is positively skewed or negatively skewed. It doesn't necessarily look the way that you would think. A positive result indicates a long tail to the right and that's where you would usually find your standard deviations that occur to the right of the mean, but if it's positively skewed, that information is pushing the curve over to the left.

Negatively skewed indicates a long tail on the left and a higher curve on the right. On skewed distribution, it is better to use the median rather than the mean. The other type of abnormality when it comes to the curve is something called kurtosis.

Kurtosis is when the curve is either really high, which is considered positive, or really low, which is considered negative. So that's the highs and lows of the information. So what does this mean? It doesn't necessarily mean something bad.

It's just something you should notice about the data, that the data is either skewed one way or another, or larger on top, or flatter. And here we let you know that a negative skew refers to a longer or fatter tail on the left side of the distribution, while a positive skew refers to a longer or fatter tail on the right. These two skews refer to the direction or the weight of the distribution.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

In addition, distribution can have a zero skew, which will be a normal distribution. So that's enough about skewness and kurtosis. Let's now take a look at the topic that we're really interested in, which is descriptive statistics.

Descriptive statistics allow you to see more information about the data that you're analyzing. Descriptive statistics allow you to see more information about the data that you're analyzing. In this case, you're going to add a plugin to your Excel ribbon in order for you to be able to use those new calculations.

They're going to be generated for you. You're not going to have to type them using functions in Excel. We have instructions in the sheet that show you how to turn on the data analysis tool pack.

So let me demonstrate here. Usually you're going to see this on the data tab and I already have it. You can also turn on something called solver.

Let me show you where you would go to get this. Since I'm using a PC, I'll follow the PC instructions. I'll go to file.

I'll choose options at the bottom. When I choose options in the bottom, I want to go over to add-ins on the left, which is this section. Then I want to go to where it says manage Excel add-ins and I'll click go.

Now if I did not have the analysis tool pack, I would have to click the checkbox here to activate it or enable it and then I would click okay. Since I already have it, it's there for me ready to be used. So what are the most useful calculations in descriptive statistics? Well the mean, the standard error, which we talk a little bit about here, it's a measure of the statistical accuracy of an estimate.

You're going to have a lower standard error if you have a larger population, whereas if you have a sample, it's not going to be as accurate. You can also see the median, the mode, the standard deviation, sample variance, and the range. The range is the highest is the difference between the highest and lowest values.

We have an example here of some descriptive statistics that have been generated. Here we're looking at statistics related to these four products, desktop, laptops, tablets, and licenses across four months. Here's the information that we were referring to in terms of the mean, the standard error, the median, the mode, sample variance, standard deviation, and so on.

Kurtosis and skewness is also another measurement that's included, as well as range. We have another example here. I'm going to delete this and recreate it so you can see how it was generated.

Now again, because this could be a very complex topic, we actually show you what you're going to need to select to generate the table that I just removed. So let's create descriptive statistics for investment 1 and investment 2 over the following months. So what I'm going to do is select data analysis tool pack up at the top on the data tab.

I want to look alphabetically for descriptive statistics. I'm going to select it right here. Then I'm going to click OK, and I want to select the input range.

The input range that I'm going to focus on is investment 1 and investment 2. So for this sheet, it's C58 over to D82. I'm using the information. Its orientation is in columns.

I'm going to use labels in the first row because I want to represent investment 1 and investment 2. The output range is specifically going to be this cell right here, F60. So I'll click output range. I'll click on this little box here and select this cell.

What do I want to see? I want to see summary statistics. So I want to make sure to check that. That's what this whole topic is about, descriptive statistics.

We want to see that. Now I'll just click OK, and the information has been generated. I see the information for investment 1, and I see the information for investment 2. Now there's some information here that's duplicated.

There's a better way to arrange this information. For instance, I can move this title over one cell and put it right above investment 1. Now, I don't need this to be repeated because I have that same information here. So I'm going to delete these cells.

I'm going to go over here. I'm going to right click, and I'm going to choose delete. And I want to shift the cells to the left because I want these numbers to move right next to the numbers for investment 1. I'll click OK, and there we go.

I'll bring this over. And so now I want to compare investment 1 and investment 2. This is very easy to read. I'm going to use the color scales, and I'll either use color scale 1 or color scale 2. And here we have a little cheat sheet that reminds you which type of color scale you should choose.

So for the mean, I definitely want the color scale that is going to display the highest value in green. So I'll go to the home tab. I'll choose conditional formatting.

I'll go to color scale and choose the very first color scale. Then for standard error, I want a low standard error. So I want green to represent the lowest value.

So that is going to be the second color scale. The median, again, I'm looking for high values. So that's going to be color scale 1. Mode is not applicable here.

We don't have any numbers that repeat. Standard deviation, I'm looking for a low standard deviation. So I'll go to color scale and choose color scale number 2. Sample variance, also, I want a low sample variance because that relates to low volatility.

Kurtosis, I'm going to choose 1 for that. And also the same for skewness. I'll choose color scale number 1. Range, that is considered bad.

So I'm going to choose number 2. I want a low range, which is kind of like volatility and variance. The minimum, well, I want the highest minimum. That's going to be color scale 1. I want the highest sum.

That's going to be also color scale 1. The sum, of course, it's going to be 1. This is going to be the last color scale because the last value is exactly the same. So as I take a look at the investments, I'm looking for which column has the most green values. And that is investment 1. It has the mean, the lowest standard error, the highest median, the lowest standard deviation and sample variance.

The kurtosis is high, although that's not as important as the maximum and sum. And so overall, when I'm looking at these two investments, using the information I get from descriptive statistics, I'm determining that investment 1 is going to be more lucrative. There's a little more information about standard error that we have over here.

I'll just briefly go over this with you. Standard error, the bigger the sample size, the smaller the standard error. Standard error of a statistic is the standard deviation of its sampling distribution or an estimate of the standard deviation.

If the statistic is the sample mean, it is called the standard error of the mean because it's not of the entire population. Standard error is a key ingredient in producing confidence intervals. The main point we want to take away from here is that the standard error divides the standard deviation by the square root of the sample size.

Standard error of zero is equal to the mean, which would only happen if your standard deviation is zero, so rare to happen. In a nutshell, the standard error describes the variation between the calculated mean of the population and one which is considered known or accepted as accurate. Again, greater population, lower standard error.

So in this section, we took a look at descriptive statistics. We learned how to install the analysis tool pack and return some additional calculations using that tool without having to write it ourselves by hand. Thank you for watching.

We'll go to our next topic after this. Again, thank you for watching.

Garfield Stinvil

Garfield is an experienced software trainer with over 16 years of real-world professional experience. He started as a data analyst with a Wall Street real estate investment company & continued working in the professional development department at New York Road Runners Organization before working at Noble Desktop. He enjoys bringing humor to whatever he teaches and loves conveying ideas in novel ways that help others learn more efficiently.

Since starting his professional training career in 2016, he has worked with several corporate clients including Adobe, HBO, Amazon, Yelp, Mitsubishi, WeWork, Michael Kors, Christian Dior, and Hermès. 

Outside of work, his hobbies include rescuing & archiving at-risk artistic online media using his database management skills.

More articles by Garfield Stinvil

How to Learn Data Analytics

Master data analytics with hands-on training. Data analytics involves the process of drawing insights from data analysis and presenting them to leaders and stakeholders.

Yelp Facebook LinkedIn YouTube Twitter Instagram