Add Items to DataFrames and Filter by Word Count

Explain how to add rows and filter DataFrame entries in pandas using `.loc` and `.str.contains`.

Discover how to effectively manipulate DataFrames by adding new items and filtering based on specific conditions. Learn practical techniques for handling multi-word data entries and calorie-based filtering using pandas.

Key Insights

  • Demonstrates adding a new item ("hot dog") into a DataFrame, highlighting potential indexing pitfalls and effective troubleshooting by adjusting the insertion index (length of DataFrame plus two).
  • Illustrates filtering a DataFrame to create a subset ("max 650 calorie df") containing only items with 650 calories or fewer, emphasizing conditional filtering using pandas.
  • Shows how to filter items with names containing multiple words ("multi-word df") by leveraging string methods in pandas, specifically using the str.contains() method with spaces.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

And here's the challenge: put "hot dog" back at the end of the food_df. So pause and do the same thing you just did with "BLT" except for "hot dog".

For "hot dog", you can just make up the values. Okay, "hot dog" into the DataFrame. We're going to say food_df.loc[square bracket.

We want the length of the food_df. That's where the new item is going in. And that is going to be set equal to: call it "hot dog" as the name.

The price will be 4.50. The calories will be 350. Vegan is False, and the bread will be "hot dog bun".

Oh, it went in and replaced. That's interesting. Why did it go in and replace? Oh, right, right, right, right, right, right, right, right.

Python for Data Science Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

It replaced "falafel" because the length is—you know—two items are missing. We want to come in at the end after "hot dog". But since two items are missing, instead of going in after "BLT", it goes in before "BLT".

So what we'll do is run this again and put the "falafel" back. "Hot dog" overwrote "falafel".

So put it back. And the reason it overwrote it is we said we wanted to go in at the length number. And the length number is not the max number.

We really wanted to go in now at length plus two, right? Or length… yeah. How many items are there? Yeah.

10,11,10,11,12,13,14. We wanted to go in at len(food_df) + 2. We'll say food_df.loc[len(food_df) + 2].

This being the .loc[] with square brackets. Okay. This being a location that we wanted to go in at.

There you go. "Falafel" is back in. All righty.

Moving on. Make a new DataFrame of just max-650-calorie items. And then make another—do another one—do a double challenge here.

Make a new DataFrame called multi_word_df that contains only those items of two or more words. So no "falafel", no "pizza", no "BLT", no "Reuben", but "tuna salad sandwich", "turkey sandwich"—anything that's more than one word is what you want in your multi_word_df. So pause, make two DataFrames, different filter challenges.

Okay. Here we go with the solution. We're going to say max_650_cals_df = food_df.

And we're going to filter inside the food_df and "cals". And we want less than or equal to 650, right? The max is 650. There you go.

Nothing more than 650. That is your condition. Only those rows where the "calories" value is less than or equal to 650 will go into the result.

All right. Challenge: make a new DataFrame called multi_word_df.

Hint: salads_df and burgers_df. A reminder that you want to use that string contains.

Contains a space, right? So multi_word_df is the food_df. And we're going to filter on food_df["item"], right? The name of the food. .str.contains(" "). There is no case in a space character, right? We don't need to do that case False thing.

There you go. And all you have now in your results are the multi-word items because they are the ones with a string containing a space. And the one-word foods do not have a space in the name.

Brian McClain

Brian is an experienced instructor, curriculum developer, and professional web developer, who in recent years has served as Director for a coding bootcamp in New York. Brian joined Noble Desktop in 2022 and is a lead instructor for HTML & CSS, JavaScript, and Python for Data Science. He also developed Noble's cutting-edge Python for AI course. Prior to that, he taught Python Data Science and Machine Learning as an Adjunct Professor of Computer Science at Westchester County College.

More articles by Brian McClain

How to Learn Python

Master Python with hands-on training. Python is a popular object-oriented programming language used for data science, machine learning, and web development. 

Yelp Facebook LinkedIn YouTube Twitter Instagram