Evaluate machine learning predictions effectively by interpreting accuracy scores and detailed classification reports. Understand precisely how precision, recall, and F1 scores reveal your model's strengths and weaknesses.
Key Insights
- Using the KNN model, accuracy was evaluated at 97%, indicating only one incorrect prediction out of 30 test cases.
- A detailed classification report from SK Learn Metrics showed perfect precision and recall for the Setosa category, but highlighted mild inaccuracies distinguishing between Versicolor and Virginica.
- The classification report provided critical evaluation metrics, including precision, recall, and F1 score, helping to better understand the model's predictive performance.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
Let's check our score a couple of different ways. First, accuracy. What is our accuracy—out of all the predictions we made, how many were correct? We can get that by doing knn_model.score. And it looks like, ah, we need to score some data.
We're missing two required positional arguments: X and y, indeed. In order to score it, we need to give it the testing data. Here's the X_test data.
Make your predictions based on that, and then here's the answers. Tell me how many we got right. And that's pretty good, 97%.
So that means we only missed 3% of it, which probably means only one wrong out of 30. Getting one wrong would result in 97%. We got one wrong out of 30.
We could sit here and eyeball it to try and figure out which one it is. Tempted to do that, but we definitely got one of them wrong, and we can see, however, better if we get a classification report. It will tell us what we missed.
If you remember, we talked about precision and recall. Precision is out of that category. When we guessed that category, how often were we right? And recall is how often our guesses for that category were correct, out of the total number of times it actually was that category. How often did we identify that category correctly? We can get all of that, and the F1 score, which is the harmonic mean of precision and recall, we can get that using the classification report.
That's a function given to us by sklearn.metrics. Let's make a report. It's the classification report, and we'll pass it.
Here's the actual answers. Here's our model's predictions. And also, just to make this easier for us to read, we'll give it the iris data's target names, and then we'll print that report.
And here it is. We can see that we had perfect precision and recall on setosas, but we got a little bit wrong in the versicolor and virginica. We'll dive into that more in the next video.