K-Nearest Neighbors with Iris Flower Data Visualization

Gain clarity on how K-Nearest Neighbors (KNN) effectively categorizes iris species by analyzing multiple dimensions simultaneously. Understand how multidimensional data, challenging for humans, becomes easily manageable for computers.

Key Insights

K-Nearest Neighbors (KNN) classification visually demonstrates clear clustering of iris species—setosa, versicolor, and virginica—based on sepal width and length.
While humans easily interpret two-dimensional data visualizations, assessing multiple dimensions such as sepal width, sepal length, petal width, and petal length proves challenging.
KNN simplifies working with higher-dimensional datasets, as computers efficiently calculate and compare multidimensional distances to classify data points accurately.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

Let's take a look at some images to visualize what we're doing with these irises. If you run this code block, you get an image, and it's of a particular species called Versicolor of irises, and we're looking here at the sepal width and length. This is the length; this is the width. These are sepals, and these are petals.

You don't need to have a lot of domain knowledge about flowers to do this one. And we can graph the sepal width against length for everyone and get a graph that may help us understand how K-Nearest Neighbors is going to work with this. So that's the next code block right here.

Run that, and the three species we're going to work with are Setosa, Versicolor, and Virginica. And you can see we have, we look at sepal width and length, we have many setosas over here, many virginicas over here, many versicolors over here. When we have a new item like this one, it's pretty obvious that it's going to be a Virginica.

The nearest neighbors are definitely the Virginica ones. However, the issue that we'll face here is that, yeah, it's pretty easy for us humans to take a look at this data and say for any dot which one it is if we're only looking at two dimensions, sepal width and height. It's much harder to eyeball when we're actually looking at sepal width, sepal height, petal length, and petal width.

Now, that is four dimensions, four variables, and it's hard for us to visualize things in four dimensions. But for the computer, it's actually quite easy. It's very easy for it to calculate the distance between four dimensions and its nearest neighbors and determine the smallest average distance between it and the others along four dimensions, working in four-dimensional space.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

This is very hard for us to work with. Even three-dimensional space becomes much more challenging for us, let alone four, five, or six. So, that's where k-nearest neighbors will really help us: working with multi-dimensional, higher-dimensional datasets, as we'll see throughout this lesson.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

Key Insights

Colin Jaffe

How to Learn Machine Learning