Learn how to transform real-world stock data using Python’s pandas library, focusing on converting string-based financial figures into usable numerical formats. Build practical skills with apply and lambda functions to clean and analyze data efficiently without altering the original dataset.
Key Insights
- Used pandas to read stock data from a CSV file containing mixed data types, including string representations of percentages and monetary values.
- Applied a lambda function with the apply method to convert the "Index Weight" column from a percentage string (e.g., "1.5%") to a float by stripping the percent sign and casting to a float.
- Created a new column by transforming the "Volume" strings (e.g., "2.24m") into numerical values representing total dollars using string manipulation, multiplication by 1,000,000, and maintained the original column for readability.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
Let's apply some of what we've just done to some closer to real data. We have some stock data over here in our CSV file in nasdaq.csv, and it's pretty simple. It's company, symbol, index, weight, last price, and volume.
And as you can see, some of them are strings, some of them are numbers. Volume is in terms of millions of dollars. Index weight is a percentage but expressed as a string.
We're working with those two columns mostly. But to make that happen, we first have to pull it into our Python file using pandas. So let's read that file in.
What we'll do is we'll say stocks, seems like a good name for our variable. Pandas read CSV and pass in the name of our file, a path to our local file, which is in the same directory, so it's very easy. And then let's take a look at that data frame.
Yep, looks pretty much just like it did before, except now it's a usable data frame. All right, now I want to challenge you to use that apply and lambda function to convert our index weight to a decimal number. Index weight is a string here.
We want it to be a decimal number. Now the logic here isn't quite as easy, but I think you can do it. All right, I'm going to give you a moment.
Feel free to pause the video here. And when we come back, we'll take a look at a possible solution. All right, how'd you do on that? Let's see how we could solve it.
We want to change index weight. Let's check our spelling here. It's capital I, capital W, spaces and all.
We'll say stocks at index weight equals stocks at index weight dot apply. And we'll throw in a lambda that takes in, let's just keep it simple. We often do for our parameter names.
We'll just say X. And X in this case is an index weight. We want it to return calling float on it because we want a decimal number. And X is a string.
We're converting it to a number. And as you can see, it's got a little percent symbol on it. So to do that, we're going to say X dot strip.
We could do like a slice from zero to up to but not including negative one. That would slice the string without the percent symbol at the end. But it's probably a little clearer to use the strip function, which will strip a specific symbol.
And that makes it very clear what it is you're doing here. All right, let's see if that worked. Yep, looks like index weight is a number now, not a string.
Perfect. I hope you are getting the hang of it. I'm going to give you one more challenge on this before we move on to including one more tool in our apply and lambda tool chest.
We want to use the apply method and lambda to convert volume to numbers so you can calculate the total volume. So right now, again, this is a string, 2.24m, meaning million. What we want to do is convert it to a number.
And so 1.5 million should change the actual number 1,500,000. And as a bonus, do this without changing the original column. All right, I'll give you a moment to give that a shot on your own.
Feel free to pause the video here. How'd you do? I hope you did well. All right, let's give it a shot.
We didn't say exactly what to do instead of changing the original column. We could have just printed out, we could just print out the new column. Let's try that first.
So first, we'll make a new column. We'll say stocks at volume dot apply. And what we'll apply is lambda for X. Maybe I'll name it something better than X. It is volume.
It's a volume. What we want to return is a float version. This is reasonably similar to what we just did.
Float version of volume dot strip the letter m. But then also take that whole part there and multiply each of those floated and stripped out numbers times 1 million. Now if you, what we want here is these underscores. That just makes this an easier number to read.
That's much easier to tell. It's much harder to tell that that's a million than with the underscores. And the underscores don't affect the number at all.
Python ignores them, but it makes it much more readable for humans. And that's what we are. Now that we have this, this is a, this is a new column.
If we want to calculate the total volume for the entire column, now we can do that. We couldn't do that with strings, but now that it's a number, we can absolutely do that. The way we're going to do it is run dot sum on this column.
And that's we get. Of course it's not printed out in a nice way, but let's see. Yeah, 238,400,000 is the answer.
Now we could print that out. We could evaluate it. We could store this as a column.
We could say stocks at volume as number equals this, and then check stocks at volume as number dot sum. There's another way we could get this. And we can also then evaluate.
We could print out stocks, volume as number sum, and also check out our stocks. And sure enough, they now have a volume as number column. And yet they keep, and this makes it nice and human readable.
This is for doing our math on this column, volume as number. This one is for viewing as humans, so we can see how many they are. And now we have both without overwriting one of the other.
All right, when we come back, we're going to take a look at ternaries and then fuse them in with our lambdas to make a very powerful combination.