Neural Network Predictions: Accuracy and Fine-Tuning

Evaluate model predictions on test images and compare them to actual labels to assess accuracy.

Evaluate neural network predictions with precision and clarity, translating complex outputs into readable confidence percentages. Learn practical methods to quickly assess and verify model accuracy using Python techniques.

Key Insights

  • The article demonstrates converting model predictions into clear percentages, showing a digit classification model's confidence of 99.96% that the tested image was a "7" and a slight 0.04% chance it was a "3."
  • Using Python's NumPy argmax function, the author accurately identifies predicted digits, simplifying the analysis of extensive arrays of numerical prediction outputs.
  • The model was tested on 120 examples from the dataset, correctly predicting every digit, showcasing the high accuracy of the neural network before discussing more rigorous evaluation methods.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

Let's take a look at predictions and their shape. And in the next section, when we do neural networks one more time, we'll take a look at really analyzing whether these predictions are right on a bigger scale. But for now, let's take a look at the predictions themselves.

I'm gonna make a predictions variable and set it to our models.predict, what our models.predict method returns, running it on the testing images normalized. That'll take just a second. It's gotta actually run it.

There we go. Literally one second. But you know, in Python terms, that's forever.

And now let's analyze that. What is our value? Let's print first testing value and do predictions at index zero. Whoa, that's kind of hard to read, right? What this is is some very mathy floats.

This is 1.13 times 10 to the seventh, the negative seventh, the negative seventh. This one is actually 99%, 0.99. This is 9.99 times 10 to the negative first. We're gonna do a little bit of work to make this a better shape in just a moment.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

But yeah, this is a list of all its juice, its chance, its confidence, that it is all 10 digits. And you can see that some of these are incredibly small, 0.000001%. But I don't think quite that many of those, but you know, pretty close. Maybe five O's percent.

And what that means is that it usually has almost no thought that it's that number, but this is its confidence or lack thereof that it's a zero. This is one, this is two. What we're looking at here is the index.

In our case, just because we're looking at digits zero through nine, the index 012 is also the digit it's predicting. And so for this one, if we counted this up, one, zero, one, two, three, four, five. Did I just lose count? I think so.

Zero, one, two, three, four, five, six, seven. It thinks that first one was a seven. It's 99.96% confident it's a seven.

Let's take a look at these predictions in a slightly more readable format. I'm gonna do a little bit of fancy formatting with a bit of fancy Python, not that fancy, but a fairly standard list comprehension. If your Python chops are up, then this kind of thing isn't so hard.

But if it's not, that's totally fine. Other people focus on different things. But let's take a look at, we're going to make a new list where we convert each prediction to a float for every prediction in the predictions list.

But we're not done. But wait, there's more. We're also gonna multiply each one by 100 to put it on a percent scale.

And we're also gonna round it to two decimal places, so it's 99.99% maybe. So I'm gonna say round this float of prediction times 100 to two decimal places. All right.

And it looks like I did my fanciness slightly wrong. Probably had to do with the parentheses. Oh, let me take a look.

Only length minus one arrays can be converted to Python scalars. Okay, what I do, what I do, what I do. Float prediction times 100.

Oh, we wanna actually make a float out of the prediction times 100, not multiply the prediction times 100. Yeah. Okay, so that's one mistake, but don't worry.

There's more mistakes. Ah, predictions is all 10,000 answers. We want predictions zero.

There we go. All right, so 0% is what it narrows down to, 0%, 0%, 0.04%, 0%, 0%, 0%, 99.96. So what this means is that zero, one, two, three, four, five, six, seven. It was 99.96% sure it was a seven, but with a 0.04% chance that it was actually a three.

But that's really pretty sure that it was a seven. Now we can check, hey, was this a seven? We can say, actually first, hey, my counting could be off. Let's see what the biggest number here is.

np.argmax will give us the index of the highest value in an array. We could say print out np.argmax, the prediction was, can never spell the word prediction right. My prediction is I will always misspell prediction.

np.argmax of predictions at index zero. Yep, seven. Just because, hey, counting based on zero is hard, and we all make mistakes.

All right, so what was our actual digit? Let's see. It would be the correct answer was and it would be in our labels, our testing labels at index zero. The correct answer was seven.

Great. Now, what if we want to look at more digits? We want to have a better eyeball of how accurate this thing was. All right, so I'm going to print, I'm going to make some predicted digits, and it's going to be another one of these list comprehensions which hopefully I won't make a mistake in this time.

We'll run an integer. We'll convert to an integer, the maximum value of a prediction for every prediction in predictions. Yep, so for every prediction, every array of 10 items, give me the integer version of the highest, the integer version of the index of the highest number.

Nailed it. And that'll be our, it'll be a new list of predicted digits. Let's take a look at how we could look at the correct answers now.

Not much needed in terms of making it into a list comprehension, a little simpler, but to put it into a format that'll be, that'll match up. We want it also to be an integer. So int of answer maybe.

We should probably do more technically correct and say label. For label in testing labels. Let's check predicted digits, maybe the first 30.

Print those out. Also print out the correct answers, the first 30. Okay, I did all that manipulation to make these line up really well.

And you can see for the first 30, let's check, let's check. Yep, it got every single one right. Let's check for the next 30.

I'll say from index 30 to index 60. Scanning across, looks like we also got all those right. But how many do you think we'll have to check before we find a wrong answer? Could be quite a lot.

I'll stop at a certain point, but let's check number 60 to 90. All correct. And let's do one more check.

Let's check 90 to 120. Correct, correct, correct, correct, correct, correct, correct, correct, correct. So it got the first 120 absolutely right.

This is a really, really good system. So in the next lesson, we'll take a look at how accurate a model like this actually is. Looking at harder metrics than eyeballing.

And we'll also solve a new problem with it. And we will also finally set things up so that we have a proper way to measure it and talk about how to fine tune a system like this and how to over tune, how to do too much tuning of a system like this. All this coming up.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

More articles by Colin Jaffe

How to Learn Machine Learning

Master machine learning with hands-on training. Use Python to make, modify, and test your own machine learning models.

Yelp Facebook LinkedIn YouTube Twitter Instagram