Visualizing and Interpreting Data

Gain insights into employee retention by evaluating a robust dataset of nearly 15,000 entries. Learn how data quality and size influence the accuracy of predictive models in workforce analytics.

Key Insights

The dataset includes approximately 15,000 rows without any missing values, ensuring high data quality and eliminating the need for data cleaning.
Of the entries analyzed, 11,428 employees stayed with the company, while 3,571 employees left, highlighting a clear majority retention rate.
A larger dataset provides significant advantages in training predictive models due to improved accuracy from increased data availability.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

Let's take a couple of quick looks at our data. So first, we can check for any null values. And the way we can do that is we can say, HRData—check for NA (not available) values and sum them up.

If we run that, there are none, which we already know because we prepared this material. So, there are a lot of rows in this dataset. If you didn't notice before, there are almost 15,000 rows.

That's a lot of data, without a single NA value. So it's fantastic data. We don't have to go through the steps that we did in the last one for removing values, removing rows that wouldn't have the data we actually want.

And we have a huge number of rows, which is a huge advantage when we're talking about the accuracy of our model. Providing more data will help the model train better. Let's take a look at visualizing our data.

All right, we can look at, you know, how many people left and stayed. One way we can do that is look at some random values. Here are 10 random values, and we can see this time most of them left, and one stayed.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

Oh, I'm sorry, other way around. Most of them stayed; the one means they left. If I run that cell again, now we're looking at another random sample.

There are two out of 10 who left. Now one out of 10, now four out of 10 left. But, you know, these are just quick visual checks, right? And we get a very different perspective compared to just looking at the first five and last five rows.

It's like, oh, they all left. So here's how we're going to get the actual answer. How many left, how many stayed? We're going to look at our HRData.value_counts() for the "left" column.

What we get is 11,428 stayed (their "left" value was zero), and 3,571 left. Clearly, the majority of people stayed. All right, we'll dive into our data and perform a bit of data analysis next.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

Key Insights

Colin Jaffe

How to Learn Machine Learning