Evaluate machine learning model accuracy by testing predictions against unseen data. Learn how to effectively compare model outputs to actual results using Python.
Key Insights
- Evaluate model accuracy by comparing predictions generated with
model.predict()
against unseen test data, allowing for assessment of how well the model generalizes. - Convert the test labels from a Pandas series to a list to facilitate a clear side-by-side comparison between predicted values and actual test outcomes.
- Examine prediction accuracy visually for small datasets (around 31 rows), noting that predictions often approximate actual values closely, though occasional discrepancies occur.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
Okay, let's properly test our model now. And again, we withheld some of our data, our test data. We can now see what it thinks.
We're giving it a test—like giving it a quiz and saying, "Okay, you've learned your math; now what is 10 minus six?" All right, trying to teach subtraction. Or you've learned cats and dogs; now what's this—is it a cat? You haven't seen this one before, but based on what you learned, is this a cat or a dog? And we'll see how accurate it was. All right, so our test data is small enough, it's only about 31 rows, so I think we can just take a look at it.
We'll say, "Okay, let's make a variable called model predictions." And assign to it whatever calling model.predict evaluates to. Predict is a method our model now has.
This time we don't pass it X and Y; we don't want it to have the answer. Instead, we just say, "Hey, look at the X-test data and give me your predictions." We'll run that block, and then let's print it out.
And those are certainly some predictions. Are they good? Well, we actually have the answers. We can test it against Y-test.
We can say, okay, print out Y-test. Actually, we want the list version of Y-test because the model predictions are a list, while Y-test is a Pandas series. This'll make this look pretty similar.
Convert it to a list. All right, so some of these are accurate and some of them are going to be a little off. 26.6 compared to 31.39. That's reasonably close.
This one's also reasonably close. It guessed 16.6; it was actually 19. This one's a little more off.
14.69 compared to 22. That's like 50% off. This one, the fourth one, is also super off.
Some of them are going to be correct, and some of them are going to be off. But they're all reasonably close. And some of them are going to be really, really close.
I'm looking for an example here; this prediction of about 39 compared to 46, that's pretty close. And this 19.39 is very close to the 19.58, if I'm counting correctly. I'm not certain that I am.
The great news is that we're just eyeballing it. We're seeing that it's pretty close. We have a way to directly measure how close these answers are.
Let's take a look at that next.