Regression Analysis with Excel's Analysis ToolPak

Learn how to perform simple linear regression efficiently using Excel's Analysis ToolPak to predict relationships between variables. Gain practical insights into evaluating regression results and interpreting key statistical measures.

Key Insights

The Analysis ToolPak in Excel simplifies performing simple linear regression by providing tools such as regression equations, residual plots, and normal probability plots, enabling users to visually and statistically assess relationships between variables.
Interpreting an R-squared value of 0.827 indicates that 82.7% of the variation in revenue can be explained by the number of employees, signifying a strong predictive relationship.
The regression analysis demonstrates significant predictive power, with a very low P-value (close to zero) and high T-stat value (10), confirming that the number of employees significantly impacts predicted revenue.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

Simple linear regression. I want to estimate the relationship between variables. We were able to do this using y equals mx plus b. Now we can also do this using the Analysis ToolPak that's available in Excel.

So here are our goals. We want to determine the least squares regression equation to predict the dependent variable based on the independent variable. We want to determine how well the regression equation predicts the dependent variable.

And we want to calculate the contribution of an independent variable to the prediction. We did this using y equals mx plus b using the slope and intercept functions in Excel. Now we're going to use the Analysis ToolPak.

So I'm going to head over to the Data tab. And from the Data tab, I'm going to go over to Data Analysis. I'll click that.

Alphabetically, I'll look for regression. There it is. I'm going to click OK.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

We show you exactly what to select here. So for our input range, we're going to select the y variable first. It literally says Input Y Range.

And that is the revenue. I'll press control shift down. I only want to go to row 49.

That is what's selected. I'm now going to head over to the X variable, which starts with employees. Control shift down.

Make sure I only go down to row 49, so the columns are equal to each other. I do have labels.

So I want to make sure "Labels" is selected. Now, where do I want to output the results? I'll attempt to output the results right here, so you can see them.

I'll click right over here. I think I should have enough space over to the right to display the information I want to show. And then I can choose the kinds of results I want to see from the regression analysis.

I'm going to choose "Residuals, " "Residual Plots, " and "Line Fit Plots." Those are going to be different types of charts. I'm also going to choose the "Normal Probability Plot."

I think I have everything I need. I'm going to click OK. And here is the summary of my statistics.

Some of these statistics I have talked about earlier in our prior regression exercise. We have R-squared, which is very important. What is the meaning of R-squared? The lowest R-squared value is zero, which means that the points are not explained by the regression, whereas the highest R-squared is one.

So you'll remember this from the R coefficient. One means highly reliable. There's a high correlation.

Zero, there's none. Negative indicates an inverse relationship. And so one means that all points are explained by the regression line.

For example, an R-squared of 0.85 means that the regression explains 85 percent of the variation in our Y variable. R-squared between 0.50 and 0.99 is acceptable in social science research, especially when most of the explanatory variables are statistically significant. So you may allow for a little bit more flexibility for what is considered significant in the social sciences.

So is there a strong correlation between revenue and employees? We have our R-squared of 0.82. So that seems to indicate that there is a strong correlation. I'm just going to highlight this here. We also have some other values that we could take a look at.

Earlier, we talked about F, significance of F, the standard error, and also the coefficient. When we looked at the paired t-stat, when we're comparing the coefficients, we refer to the t-stat value and the p-value. And when it comes to employees' impact on revenue, that is a very low number.

So the p-value is very low. So there is a significant relationship between the independent variable and the dependent variable, even though it looks like it says 7.26. The negative 10 actually indicates many zeros to the left of the decimal. The t-stat value has to be 1.96 or greater, and this is 10.

So there's definitely a huge significance between employees and revenue. I'll highlight these two values. You also have some additional information.

We have the plots. Something else that you could do here, if you're trying to make comparisons between the predicted revenue and the actual revenue, these numbers actually represent the values that we have for the revenue in Y. So 1 is a standard for 238. So what I could do is copy this and replace "Observation" with the actual revenue values.

I'll go here. I'll just paste as values. And this is where you can see the relationship between the values.

And the residuals actually reflect the difference between our revenue and our predicted revenue. So it would be okay and appropriate to add those values over to the observation column. So down below, we have an assessment of the work that we just did.

So the results are: R-squared indicates that 82.7% of revenue variation can be explained by the number of employees. F is 105.7. The independent variable has explanatory power beyond what would be expected by chance. The significance F-value indicates the regression equation fits the data well.

The equation is what we know it to be. P-values for employees: When number of employees is used to predict revenue, there is near 100% chance that it yields predictive power. Conclusion: The number of employees accurately predicts revenue.

Now, I can continue my calculation because I do have the M and B values for the slope and intercept, and I can make a prediction. When it comes to 50 employees, how much revenue will I earn with 50 employees? So that's going to be equal to the Y value, which is equal to M multiplied by X plus B. When I press ENTER, 293. And so the prediction is 50 employees will lead to an increase in revenue from 274 to 293.

So in this section, we saw how we could use the Analysis ToolPak to get additional information about our regression, more than just using the simple Y equals MX plus B calculation. Thank you for watching.

Key Insights

Garfield Stinvil

How to Learn Data Analytics