Gain clarity on how k-nearest neighbors (KNN) classification works through visualizing simple data points. Learn how Python's zip function and scatter plots help demonstrate the fundamentals of KNN.
Key Insights
- K-nearest neighbors (KNN) works by classifying data points based on proximity; the article demonstrates this using simple x and y coordinates labeled into two categories (classes 0 and 1).
- The Python zip function combines separate x and y datasets into tuples, which provide structured data points for the KNN model to process and classify.
- Visualization tools like pyplot's scatter plot clearly illustrate classification data points by color-coding different classes, helping learners better understand how KNN classifies points mathematically and visually.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
Let's look at some data points. And these are just made-up data points. We'll look at some actual data in the next section.
So, in the first section of part three, we're just looking at some data that we can visualize and see the numbers for and plot and understand how k-nearest neighbors works. So here we have some X and Y values, and these are just coordinates, and again, these could be like weight and height or something else.
So then we have these classes, these categories. If an X is four and Y is 21, the class is zero. When X is five and Y is 19, the class is zero.
When X is 10 and Y is 24, the class is one. So this is the kind of data we'll feature and put in. We'll essentially give our model this data for X_train and this data for y_train, right? Here's the answer to those X and Y points.
What we'll actually give it, and we should visualize this, is tuples of this data, not X and Y separately, but a list where we've zipped up X and Y. Let's take a look at what that looks like. Zip is a Python function that takes the first item from the two arrays and puts them in a tuple. Then it takes the second item from both arrays and puts that into a tuple.
You can imagine a zipper zipping up two halves, and then they interleave. That's what's happening here. So these are the kind of data points.
X is four, Y is 21. X is five, Y is 19 that we'll give to our k-nearest neighbors. Let's plot those points.
We're going to do a little bit of graphing. Not too much, I promise. We can have a scatter plot.
PLT is, of course, pyplot. And we'll scatter X and Y, and we'll set a color for it of the classes. These zero and one, the answers.
And then it will give our color to zero. Any one of these X and Y points that is a zero will get one color. And if it has a one, it will get a different color.
And that's what the C equals argument is. Then we can say, pyplot, show us plot. And here we are.
So this is our training data. And again, it's very sparse. It's very made up.
But this is the kind of graph that k-nearest neighbors is looking at. And that's what we'll be looking at visually, and mathematically.