Analyzing Titanic Data: Combining Class and Gender for Insights

Combine passenger class and gender data using pandas to reveal compelling patterns in Titanic survival rates. Utilize clear visualizations to better understand how class and gender intersected to affect passengers' chances of survival.

Key Insights

Created a new categorical column (p-class_sex) by combining passenger class (p-class) and gender, enhancing data analysis capabilities and visualization clarity.
Identified significant survival rate disparities: first-class females had high survival rates (91 survivors to three fatalities), while third-class males fared exceptionally poorly.
Observed that gender advantage diminished significantly among third-class passengers, with equal survival and fatality numbers (72 surviving and 72 perishing), highlighting the critical impact of socioeconomic status on survival.

This lesson is a preview from our Data Science & AI Certificate Online (includes software) and Python Certification Online (includes software & exam). Enroll in a course for detailed lessons, live instructor support, and project-based training.

We're going to do a little bit of fancy pandas DataFrame work to make p-class sex a thing. P-class sex will be a combination column that will combine their passenger class—first class, second class, or third class—and their gender. So first we'll define a list of possible values.

First class female, first class male, second class female, second class male, third class female, and third class male. Then we're going to make their values a combination of the p-class value and the sex value. And the way we're going to do that is we're going to say titanic_data at p_class_sex.

It's a new column and it will be p-class plus an underscore plus sex. There's only one more thing we need to do, which is that p-class is a number (1,2, or 3), while titanic_data['sex'] is a string. To convert this one to a string so it can be concatenated with this underscore and with the value of titanic_data sex, we're going to use astype(str).

And then our last step to make this work is to make it a categorical value. That means it has only specific possible values. We're going to say now titanic p-class sex is pandas' categorical column from titanic p-class sex.

And the categories are the order up above this list here. Then we can take a look at the series titanic data p-class sex. There we go.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

We've got the head and the tail of the series—third class male, first class female, third class female, etc.—all within 91 rows. Great.

It's going to be really helpful; now we can take a look at that as a graph. We can graph that and see if this could be valuable and observe how these three columns—survived, passenger class, and sex—interact. So here we're going to: our axis is a Seaborn count plot where X is Survived and the hue is p-class sex, our new column.

And the data is titanic_data. And now we can see how each of them did. Third class male did very poorly.

Barely any of them survived. Second class male also did very poorly. And if you look at the females: first-class female—only three perished.

Ninety-one survived. Second class females—six perished, 70 survived. It's only when you get to third class that it evens out the gender advantage.

Seventy-two and seventy-two. That class was maybe not so important by the time you get down to third-class passengers, so the advantage of being a woman didn't fully counteract that. So yeah, we're seeing quite a lot of good data analysis here.

Our next step is to start putting this into data that the computer can read for modeling. Then we'll dive into a random forest classifier and see how it can help us analyze all this data.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

Key Insights

Colin Jaffe

How to Learn Machine Learning