Unpack how image pixel data fuels machine learning models in recognizing handwritten digits. Gain a deeper understanding of how grayscale arrays translate into recognizable numeric forms.
Key Insights
- The dataset contains 60,000 images, each represented as a 28 by 28 pixel array, totaling 784 grayscale values per image, with intensities ranging from 0 (black) to 255 (white).
- Each image corresponds to a handwritten digit, labeled clearly to train the neural network for accurate digit recognition.
- Data is stored in NumPy arrays, allowing visualization in platforms like Jupyter Notebook, where numerical arrays can be directly interpreted as visual pixel images for analysis.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
Okay, let's talk about these training images. They have a shape of (60,000,28,28), meaning there are 60,000 rows of data and each one is a 28 × 28 array. Each of those represents a 28 × 28 pixel image—28 pixels across and 28 pixels down—of a handwritten digit.
Each of those values—784 in total (28 times 28)—is an integer from zero to 255. And that’s the grayscale range: 0 is all the way black, 255 is all the way white, and everything in between is a shade of gray.
We're going to use those 60,000 images to train our neural network machine learning model. Let's take a look at one image. Let's print out the type of `training_images[0]`.
Let's print out the number of dimensions of it. And then let’s take a look at it. Finally, let’s output the whole thing.
Let’s just say `training_images[0]`. All right, let’s run all that. It’s a NumPy array, it’s got two dimensions because it’s 28 × 28, and it printed out like this.
Now, if I print it the regular way—`print(training_images[0])`—I get the full array. There are square brackets at the front that aren't closed until the square bracket at the end here, as you can see. So, all the rest of this represents each row of the image.
The first row is all black, the second row of pixels is all black, the third is all black—basically the first bunch of them are all black. But then we finally start to see some lighter pixels—lighter pixels going back and forth—and it sort of starts to make a little bit of a shape here.
If we just output the image again without using `print`, our Jupyter Notebook will interpret it as an image and display a 28 × 28 pixel visualization. And where on the row—starting around the fifth row—we saw a little white, there’s a little white at the end of the image as well.
We’ll visualize this in different ways as we go, but this is what it is. All of these little parts of the image are 28 × 28 arrays—each little dot is a value from 0 to 255. That’s what makes this image visible.
Let’s take a look—the image is a five. Great. We can see it’s a five. Let’s check the digit by looking at the training label.
So the image has an answer, right? What’s `training_labels[0]`? Oh, not surprisingly, we print it and get 5. And we can, in fact, look at the first 10 values—0 up to but not including 10.
And there it is. Those are our first 10 labels for each of these. And we can see the handwritten version of the next one—which is a zero—by just outputting it as an image here.
And there it is. So our machine learning model is going to have all those lists of lists. What does it look like? And it has to look at all these numbers—which we know represent pixels and digits.
It doesn’t really know that. It has to be able to say, “Okay, that looks like a zero to me.” Let’s explore further how we’re going to do that.