Exploring Logistic Regression for Employee Retention Prediction

Introduce logistic regression to predict employee attrition using HR data.

Predict employee retention accurately using logistic regression, a powerful model suited for classification tasks. Examine crucial metrics and data-driven insights to determine whether employees will stay or leave a company.

Key Insights

  • Apply logistic regression instead of linear regression for binary outcomes, particularly useful in predicting discrete variables like employee retention ("stayed" or "left").
  • Analyze comprehensive employee data including satisfaction level, average monthly hours, number of projects, and promotions received to build predictive models.
  • Evaluate prediction accuracy effectively by utilizing standard tools such as StandardScaler, train-test splits, and various performance metrics provided by libraries in Python.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

Let's talk about what we're doing next. We have done a linear regression predicting continuous values like price. Now, what about discrete values? Dog versus cat, classification problems.

For that, we use a logistic regression. And in this case, we're going to be predicting stayed or left for employees at a job. If given this salary, this average working hours, given their department, given whatever features we decide to feed into the model, did the employee, will the employee in the future, let's predict, will they stay or will they leave? We need a different model, a different type of model, a logistic one.

It's not about drawing a line. It's about yes or no, stayed or left. So let's take a look at what code we have.

We are bringing in almost exclusively the same kind of things that we brought in for the last one. Standard scaler, train test split. We are bringing in some new metrics.

We're going to dive a little bit more into how can we best measure our success or failure? How accurate was it by different readings, different measurement tools? And instead of bringing in linear regression to create our model, we're bringing in logistic regression. All right, make sure you run that and run this, which again, may take a minute if you haven't run it yet, but I already did. Our base URL should be the same.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

And now we're grabbing from our CSV some human resources analytics data. I'm going to call that data frame. We're going to make that CSV into a data frame and I'm going to call it HR data.

And it's what you get when you do a pd.read CSV and you do the base URL we've got up there. I'm waiting for this autocomplete to speed up a little bit. There it is.

And the HR CSV URL. And then we can take a look at our HR data, assuming that worked. Here it is.

We can see quite a lot of columns that can help out. This is their satisfaction level. How did they do on their last evaluation? How many projects do they have? How many average monthly hours? How many years did they spend at the company? How many work accidents have they had? A lot of zeros, that's good.

Did they leave or did they stay? We have a lot of ones here. One is for left, zero is for stayed. Our first five people all left, our last five people all left.

How many promotions did they get the last five years? Well, none, maybe that's why they left. We can see. And what department are they in? Our first five folks are in sales, our last five are in support.

And what is their salary? Which just is low, medium or high. So that's the data we have to work with. We're gonna dive into what we'll do with that data in a moment.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

More articles by Colin Jaffe

How to Learn Machine Learning

Master machine learning with hands-on training. Use Python to make, modify, and test your own machine learning models.

Yelp Facebook LinkedIn YouTube Twitter Instagram