Gain clarity on how k-nearest neighbors (KNN) classification works through visualizing simple data points. Learn how Python's zip function and scatter plots help demonstrate the fundamentals of KNN.
Key Insights
- K-nearest neighbors (KNN) works by classifying data points based on proximity; the article demonstrates this using simple x and y coordinates labeled into two categories (classes 0 and 1).
- The Python zip function combines separate x and y datasets into tuples, which provide structured data points for the KNN model to process and classify.
- Visualization tools like pyplot's scatter plot clearly illustrate classification data points by color-coding different classes, helping learners better understand how KNN classifies points mathematically and visually.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
Let's look at some data points. And these are just, you know, made up data points. We'll look at some actual data in the next section.
So first section of part three, we're just looking at some data that we can visualize and see the numbers for and plot and understand how k-nearest neighbors works. So here we have some x and y values, and these are just coordinates. And these, again, could be like, you know, the weight and height or something else.
So then we have these classes, these categories. If an x is four and y is 21, the class is zero. When x is five and y is 19, the class is zero.
When x is 10 and y is 24, the class is one. So this is the kind of data we would feature, we'll put in. We'll essentially give it, give our model this for x train and this for y train, right? Here's the answer to those x and y points.
What we'll actually give it, and we should visualize this, is tuples of this data, not x and y separately, but a list where we've zipped up x and y. Let's take a look at what that looks like. Zip is a Python function that takes the first item from the two arrays and puts them in a tuple. Then it takes the second item from both arrays and puts that into a tuple.
You can imagine a zipper zipping up two halves, and then they interleave. That's what's happening here. So these are the kind of data points.
X is four, y is 21. X is five, y is 19 that we'll give to our k-nearest neighbors. Let's plot those points.
We're gonna do a little bit of graphing. Not too much, I promise. We can have a scatter plot.
PLT is, of course, pyplot. And we'll scatter x and y, and we'll set a color for it of the classes. These zero and one, the answers.
And then it will give our color to zero. Any one of these x and y points that is a zero, it'll get one color. And if it has a one, it will get a different color.
And that's what the C equals argument is. Then we can say, pyplot, show us plot. And here we are.
So this is our training data. And again, it's very sparse. It's very made up.
But this is the kind of graph that k-nearest neighbors is looking at. And that we'll be looking at visually. And they, mathematically.