Organize and enhance your time series data using pandas data frames. Learn how to efficiently transpose and format data for clearer analysis and usability.
Key Insights
- Create a pandas data frame to structure and analyze Apple stock price data initially retrieved from an API, resulting in a large table of five rows and 6,000 columns that requires restructuring for clarity.
- Transpose the data frame to rearrange date columns into rows, providing a more intuitive layout for analysis, with dates listed vertically and corresponding price information laid out horizontally.
- Convert data entries from strings to numeric format using pandas'
map()
method combined withpd.to_numeric()
, and reformat the index into date-time objects for better compatibility and analysis.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
If we wanted to put this into a data frame so that we can examine it and do more things with the data—now that we've got it—we might want to do something with it. Putting it in a data frame is a great choice. So let's make a data frame.
I'm going to call it apple_prices
, and I'm going to make a new data frame from Pandas and base it on our data’s "Time Series (Daily)". And then let's just look at apple_prices
, which I regret not pluralizing—but I fixed it. All right, let's look at apple_prices
.
We made a data frame of apple_prices
. The problem is it’s 5 rows and 6,401 columns. It’s got a column for every single possible date.
And that's not what we want. That’s not the way a human would think through this. I mean, technically, we could work with this data, but it would just be really awkward.
What we want actually is to turn it 90 degrees—to have these column names here be our rows. Right, a row for this date, a row for this date, a row for this date, and so on. And for the columns to instead be the values previously stored in the inner dictionaries. Right—what's the opening price on this date, this date, this date, going down the list.
So fortunately, there's a method for that. Thanks, Pandas—there’s always a method for that.
So let's transpose it. We could say apple_prices = apple_prices. T
. Let's take a look at that now.
All right, that's what we want here. Now it goes from the most recent five dates all the way to the least recent five dates back in ’99. And this is a much more common—and intuitive—format. This is the way we would think about it.
The last thing we might want to do is convert everything to be numeric, and the date to be a date format Pandas can work with—just formatting this in a slightly better way. And so the way we could do that is: apple_prices = apple_prices.apply(pd.to_numeric)
.
Remember, .apply()
allows us to run a function on every row or column. If we look at apple_prices
before, these were all actually strings. But if we run that now, these values become numbers, and they’re much more readable.
Now we can convert the index to datetime by running apple_prices.index = pd.to_datetime(apple_prices.index)
, which updates the index—the left-hand column, or technically, the row labels. And now each one is a datetime index.
I believe I might have wanted our index to be that. Let's try that.
Yep, I accidentally replaced the entire DataFrame with the datetimes. But now we've got it.
Everything is in the right format for Pandas to be able to work with. So yeah, you know—we used our data frame knowledge to make a data frame out of all of our data from the API. Next, we'll start doing some things with that data.