Understanding Dataset Structure in Machine Learning

Unpack digit data into training and testing datasets and verify array shapes and types.

Dive into the process of unpacking and verifying training and testing datasets, ensuring accuracy in data preparation. Understand the structure and dimensions of image data crucial for effective machine learning model training.

Key Insights

  • The dataset is divided clearly into training and testing sets, with training images comprising 6,000 examples of 28x28 pixel arrays and testing images containing 10,000 examples of 28x28 pixel arrays.
  • Labels accompanying each dataset precisely match their associated images, with training labels consisting of 6,000 digit values and testing labels containing 10,000 digit values ranging from zero to nine.
  • The article highlights the importance of correctly unpacking data tuples and verifying the shapes and types of datasets to ensure proper preparation for machine learning tasks.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

So having understood the shape of our data, we can now unpack it and see if our understanding was right enough to have some variables to work with. All right, so we're going to unpack into testing data and training data our digits data. All right, so those are two tuples.

Remember it's a tuple of tuples. And testing data has X_test and Y_test. And training data has, sorry, testing data has X, yeah, I think I've got this backwards.

Let's make sure we get it right. Training data first, then testing data. Okay, yes, yes, yes.

So that's why I was, that's why I was, something was wrong when I was saying that out loud. I realized it. So our training data here is going to be X train and Y train, and our testing data should be our X test and Y test.

But let's see. Let's say training images, and let's spell training right, and training labels equals unpacking our training data. And the same thing for testing.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

Testing images and testing labels equals our test data. And I named them the same things that I named them before, so we've already got this printing. I'll print out the shape and the type of each of these, training images and labels, testing images and labels.

Let's run this. All right, so yep, this matches what we thought. Training images are 6,000 28 × 28 arrays.

Training labels are an array of 6,000 digits. Testing images are 10,000 28 × 28 arrays, and testing labels are 10,000 single values, in this case digits 0–9.

All right, next we'll take a look at why these are 28 × 28 pixels? Oh, I spoiled it.

Why are these 28 × 28 arrays? They're pixels. Spoiler alert. Let's take a look at that next.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

More articles by Colin Jaffe

How to Learn Machine Learning

Master Machine Learning with Hands-on Training. Use Python to Make, Modify, and Test Your Own Machine Learning Models.

Yelp Facebook LinkedIn YouTube Twitter Instagram