Exploring the Iris Dataset with Scikit-Learn

Load Iris data and examine its structure and contents using sklearn.

Gain a clear understanding of the Iris dataset structure, which includes petal and sepal measurements across three distinct flower species. Learn how to organize this dataset into a more readable format for easier analysis.

Key Insights

  • The Iris dataset provided by sklearn includes 150 samples, each with four measurements: petal length & width and sepal length & width.
  • Species classification is indicated by numerical targets (0, 1, 2) corresponding respectively to the flower types setosa, versicolor, and virginica.
  • The next step involves transforming this numerical array structure into a human-readable data frame for streamlined data manipulation and analysis.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

Let's load the Iris data and see what we're working with. Load_Iris is a function provided by sklearn that gives us back a large dictionary with lots of different properties that give us more information about the Iris data. Let's run this and check it out.

The data property is an array of arrays. Each one of these is a row containing petal length and width, and sepal length and width.

Each of these represents one of our flowers. Each flower belongs to one of the three species, and there are 150 of them.

When we get further down, here's the target. It is an array of zeros, ones, and twos. These represent the 50 flowers of each species.

Each number corresponds to a species: Zero, one, and two. These numbers correspond to a target names property, which is Setosa, Versicolor, and Virginica in order. Setosa is zero, Versicolor is one, and Virginica is two.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

We can take advantage of the fact that these are in order when we want to give each species a human-readable name. We can save this data as 'Iris_data' for convenience. If we want to look at 'Iris_data.data', it contains that array of arrays with 150 rows.

If we want to look at one of them, we can. Perhaps sepal width—though I don't have the order memorized. For example, it could be 3.5. Again, you don't need to know much about the actual flowers to work with this data.

Okay, that covers our data. Now, in the next step, we'll put this all together into a DataFrame to make it more human-readable and easier to work with.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

More articles by Colin Jaffe

How to Learn Machine Learning

Master Machine Learning with Hands-on Training. Use Python to Make, Modify, and Test Your Own Machine Learning Models.

Yelp Facebook LinkedIn YouTube Twitter Instagram