Data Processing with LabelEncoder for Categorical Variables

Apply LabelEncoder to convert categorical variables 'sex' and 'embarked' into numeric form.

Transform categorical data efficiently using LabelEncoder, a simpler yet effective alternative to one-hot encoding. Learn how to encode features such as gender and port of embarkation into numeric values for optimal machine learning model performance.

Key Insights

  • LabelEncoder converts categorical variables into numerical values, such as encoding "male" and "female" into 1 and 0, respectively, facilitating easier processing by machine learning algorithms.
  • The feature "Embarked," initially represented by letters (S, Q, C), was transformed into numeric categories (0, 1, 2) using LabelEncoder to improve algorithmic understanding of the data.
  • After encoding categorical data with LabelEncoder, the next step involves preparing the dataset further by splitting it into feature variables (X) and the target variable (Y) for modeling.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

We're going to use LabelEncoder now for the first time. LabelEncoder is another way to similar to one-hot encoding. We're going to encode using numbers over words.

So that's because it's much easier for the computer to understand a category 0,1, or 2 rather than passenger class 1,2, or 3 or using, it's not a good example because those are our numbers, or using sex, male and female, versus 0 or 1. It understands the 0 or 1 a lot better. The same for embarked. Embarked is the other one we're going to have to label and code because embarked is S, Q, or C, and we want that to be 0,1, or 2. So it's a bit of a simpler version of the one-hot encoding.

Let's take a look at it. To do so, to use LabelEncoding, we're going to instantiate a LabelEncoder. We're going to say le, that's a common name for this, is LabelEncoder, and we call it to get back an actual LabelEncoder.

And just as I was saying, there are two categories, sex and embarked, that we're going to want to label and code. And we're going to do this one at a time here for now. We'll see which ways we can go faster later.

But let's use LabelEncoder to do so. Let's say titanic data sex equals the LabelEncoder fit transform version of sex. And now, if I just check out that series… Looks like I've got an error here.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

Interesting. Oh, I failed to actually run this code block. There we go.

Let's try that again. There we go. And now it's… We can see that the male and females have been split up into ones and zeros.

Okay, let's do the same thing for embarked. TitanicData embarked is also equal to the le fit transform version of titanic data. Embarked.

And then we'll take a look at the whole TitanicData and see both of these at once. There we go. Sex has been turned into one for male, zero for female.

And embarked is now zero, one, or two, for whichever port they embarked from. Great. Our next step, we're actually going to start massaging this data a little bit more.

We're going to split it into X and Y and see what we can do with it.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

More articles by Colin Jaffe

How to Learn Machine Learning

Master machine learning with hands-on training. Use Python to make, modify, and test your own machine learning models.

Yelp Facebook LinkedIn YouTube Twitter Instagram