Learn how to perform simple linear regression efficiently using Excel's Analysis ToolPak to predict relationships between variables. Gain practical insights into evaluating regression results and interpreting key statistical measures.
Key Insights
- The Analysis ToolPak in Excel simplifies performing simple linear regression by providing tools such as regression equations, residual plots, and normal probability plots, enabling users to visually and statistically assess relationships between variables.
- Interpreting an R-squared value of 0.827 indicates that 82.7% of the variation in revenue can be explained by the number of employees, signifying a strong predictive relationship.
- The regression analysis demonstrates significant predictive power, with a very low P-value (close to zero) and high T-stat value (10), confirming that the number of employees significantly impacts predicted revenue.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
Simple linear regression. I want to estimate the relationship between variables. We were able to do this using y equals mx plus b. Now we can also do this using the analysis tool pack that's available in Excel.
So here are our goals. We want to determine the least sum of squares regression equation to predict the dependent variable based on the independent variable. We want to determine how well the regression equation predicts the dependent variable.
And we want to calculate the contribution of an independent variable to the prediction. We did this using y equals mx plus b using the slope and intercept functions in Excel. Now we're going to use the analysis tool pack.
So I'm going to head over to the data tab. And from the data tab, I'm going to go over to data analysis. I'll click that.
Alphabetically, I'll look for regression. There it is. I'm going to click OK.
We show you exactly what to select here. So for our input range, we're going to select the y variable first. It literally says input y range.
And that is the revenue. I'll press control shift down. I only want to go to row 49.
That is what's selected. I'm now going to head over to the X variable, which starts with employees. Control shift down.
Make sure I only go down to row 49. So the columns are equal to each other. I do have labels.
So I want to make sure labels is selected. Now, where do I want to output the results? I'll attempt to output the results right here. So you can see them.
I'll click right over here. I think I should have enough space over to the right to display the information I want to show. And then I can choose the kinds of results I want to see from the regression analysis.
I'm going to choose residuals, residual plots, line fit plots. Those are going to be different types of charts. I'm also going to choose the normal probability plot.
I think I have everything I need. I'm going to click OK. And here is the summary of my statistics.
Some of these statistics I have talked about earlier in our prior regression exercise. We have R-squared, which is very important. What is the meaning of R-squared? R-squared, the lowest R-squared value is zero, which means that the points are not explained by the regression, whereas the highest R-squared is one.
So you'll remember this from the R coefficient. One means highly reliable. There's a high correlation.
Zero, there's none. Negative, there's an inverse relationship. And so one means that all points are explained by the regression line.
For example, an R-squared of 0.85 means that the regression explains 85 percent of the variation in our Y variable. R-squared between 0.50 and 0.99 is acceptable in social science research, especially when most of the explanatory variables are statistically significant. So you may allow for a little bit more flexibility for what is considered significant in the social sciences.
So is there a strong correlation between revenue and employees? We have our R-squared of 0.82. So that seems to indicate that there is a strong correlation. I'm just going to highlight this here. We also have some other values that we could take a look at.
Earlier, we talked about F, significance of F, the standard error, also the coefficient. When we looked at the pair T-stat, when we're comparing the coaches, we refer to the T-stat value and the P-value. And when it comes to employees' impact on revenue, that is a very low number.
So the P-value is very low. So there is a significant relationship between the independent variable and the dependent variable, even though it looks like it says 7.26. The negative 10 is actually indicating a lot of zeros to the left of that decimal. The T-stat value has to have a value of 0.196 or greater, and this is 10.
So there's definitely a huge significance between employees and revenue. I'll highlight those two values there. You also have some additional information.
We have the plots. Something else that you could do here, if you're trying to make comparisons between the predicted revenue and the actual revenue, these numbers actually represent the values that we have for the revenue in Y. So 1 is a standard for 238. So what I could do is copy this and replace it and replace observation with the revenue values.
I'll go here. I'll just paste as values. And this is where you can see the relationship between the values.
And the residuals actually reflect the difference between our revenue and our predicted revenue. So it would be okay and appropriate to add those values over to the observation column. So down below, we have an assessment of the work that we just did.
So the results are the R-square is 82.7% of revenue variation, and it can be explained by the number of employees. F is 105.7. The independent variable has explanatory power beyond what would be expected by chance. Significance F, regression equation fits the data well.
The equation is what we know it to be, P values, employees. When number of employees is used to predict revenue, there is near 100% chance that it yields predictive power. Conclusion, the number of employees accurately predicts revenue.
Now, I can continue my calculation because I do have the M and B values for the slope and intercept, and I can make a prediction when it comes to 50 employees, how much revenue am I going to earn with 50 employees. So that's going to be equal to, that's the Y value, which is equal to M multiplied by X plus B. When I press ENTER, 293. And so the prediction is 50 employees will lead an increase of revenue from 274 to 293.
So in this section, we saw how we could use the analysis tool pack to get additional information about our regression more than just using the simple Y equals M X plus B calculation. Thank you for watching.