Creating Predictions for Kaggle's Titanic Challenge

Create predictions using a random forest model, format the Kaggle submission CSV, and upload it to Kaggle for scoring.

Create accurate predictions using Python and Random Forest classifiers, then evaluate your model's effectiveness by submitting results to Kaggle. Learn the complete workflow from preparing prediction arrays to formatting CSV submissions for Kaggle's Titanic competition.

Key Insights

  • Created a prediction array by applying model.predict on the test dataset, generating an array consisting of zeros and ones indicating passenger survival.
  • Prepared a submission CSV file conforming strictly to Kaggle's format by retrieving the passenger IDs from the original Titanic test dataset and including predictions, explicitly setting index=False to exclude unwanted index columns.
  • Submitted the CSV file to Kaggle's Titanic Machine Learning competition, achieving an accuracy score around 77–79%, providing students motivation to explore improvements and adjustments to further optimize the Random Forest classifier model.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

Okay, let's pick up right where we left off. We're going to create a predictions array. Let's call it predictions.

And it will be what happens when we run model.predict on our X_test. It is 400 and something zeros and ones. Not very helpful for us without context for whether we got these correct or not, but we'll use that in our next step, which is to create a data frame that will have Passenger ID and predictions.

Now, we're going to submit this to Kaggle, and it has to be in this exact format so that its algorithm can give us, can check it against the Y_test answers, and give us an accuracy score. We need our Passenger ID. I foolishly removed and overwrote the X_test and got rid of the Passenger ID, but we can get it back.

What we're going to do is simply read from the test CSV again and get that right back. So, I'm going to create a Titanic_test data frame, and it's reading the CSV from our base URL plus CSV/test_Titanic.csv. Let's double-check that. Yep, it's got the Passenger ID that I got rid of in X_test.

Okay, great. Since we've got that, let's now make a data frame called Titanic_submission. Sure.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

Titanic_submission is a new data frame, and it's got a Passenger ID column that should equal the Titanic_test data frame’s Passenger ID. Then we're going to include a Survived column, and it’s going to equal our predictions from above—these zeros and ones up here.

Then we can check our submission. It looks pretty good. Passenger ID and zeros and ones.

Just those two columns are all that Kaggle wants. Now, saving it as a CSV is a little bit of work, but not too bad. We want to save it to Google Drive in our case, then download it.

And we’re going to make sure to set index=False. If we don’t do that, we’ll get another column that contains these indexes. We don’t want that.

We want only Passenger ID and Survived as columns in the CSV we’re uploading. We’re going to use Titanic_submission.to_csv, and we’re going to save it to our base URL on Google Drive plus CSV/Kaggle_submission.csv.

Finally, index=False so that we can only have those two columns in it. Perfect. Okay.

Run that line of code. Now we're ready to submit that to Kaggle. It should be downloaded to your Google Drive.

Let’s check it out. Here’s my Kaggle_submission.csv, but if you need to find it, it’s in my drive. It should be in Python Machine Learning Bootcamp, CSV file, Kaggle_submission.csv is what I just named it.

So, I’m going to download that now. Right-click on it. Yep.

Click Download. And yep, it’s downloaded. Now I’m going to go to Kaggle, and we’re going to submit it.

If you don’t have a Kaggle account, you’ll need one for this step, but you should get one anyway. Kaggle is fantastic.

It’s a big part of the machine learning community, and it’s a great place to learn. What you’re going to do is find the Titanic competition. If you search at the top, let me walk through that a little bit more.

Go to competitions and type Titanic in the search bar.

Click on Titanic under competitions, Titanic: Machine Learning from Disaster. What you’re going to do is submit our CSV.

Click ‘Submit Prediction’ up here.

Then, find the file you downloaded.

Now, it’ll run, and it will then give you a score. It should be around 79%. Here’s the one I just did.

Ooh, down to 77%. I must have made a change.

It’s a fine score. It’s a great starting point for thinking, ‘How do I improve my score?’

How can I improve my results? What factors improve the score? What can I adjust? What can I fine-tune?

And we’ll continue with the next lesson.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

More articles by Colin Jaffe

How to Learn Machine Learning

Master Machine Learning with Hands-on Training. Use Python to Make, Modify, and Test Your Own Machine Learning Models.

Yelp Facebook LinkedIn YouTube Twitter Instagram