Discover how to enhance your data visualizations with Python by mastering scatter and line charts in Matplotlib. Learn to pinpoint significant values visually by embedding highlighted markers into your charts.
Key Insights
- Create detailed line charts in Matplotlib for analyzing stock-price fluctuations over a 60-day period, marking minimum and maximum values clearly using highlight markers.
- Utilize NumPy's argmin and argmax functions effectively to determine precise indices of critical data points within a dataset, facilitating direct data annotation.
- Differentiate clearly between Matplotlib's plot method, which takes only y-values with auto-generated x-axis values, and scatter method, which requires explicit x and y coordinates.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
This is a lesson preview only. For the full lesson, purchase the course here.
Hi, welcome back to this course for Python Programming and Data Science. My name is Brian McLean, and I've been your instructor now the whole time, and we're up to Lesson 9 of 10 now. Thanks for sticking with it.
So, in this lesson, we're going to continue with Matplotlib, and we'll be, in the last lesson, we charted bar charts, and this one we'll be looking at scatter and line charts, and we'll either combine a scatter chart, which is dots in an xy coordinate system, with a line going through it as we make what is called a regression line, which is an important concept in data science. So, it'll all make sense and become clear as we go, so we better dive right in. So, go ahead, and you're going to be opening up Lesson File 9. We're going to import the big three.
Let's go ahead and import NumPy, as np, or pandas, as pd, import Matplotlib.pyplot as plt, and we want the images. So, we're going to say from ipython display import image. We also need to connect to Google Drive, ultimately, not immediately, but we will.
So, to load some pictures. We don't have a CSV for this one, but we will in the next lesson. Colab import drive dot mount content dot drive.
Let's run this. Run the other one. Mm-hmm.
Ipython dot display. Okay, fine. Connect to continue.
Yep. We're looking for that mounted at content drive confirmation in the little output window. There we go.
Okay. So, we're going to make a line chart, which is another kind of chart. We're going to do a little of some manipulations with some fake data.
So, we've got some hypothetical stock prices here. Kind of zigzagging over 60 days. Let's print the length of this stock prices.
I believe it is 60. Yep. And let's get the min price.
So, if we want to come in here and get the min, we could use min, if you remember. Lesson, that might have been lesson one. The min method, the min function takes a list and just returns you the minimum value.
In there somewhere, the stock reached a low of 51. And the max would be the max function, same list, 94.8. And the reason we want the min and the max is we're going to chart this as a jaggy line, but we also want to highlight with stars or some kind of big dot, the min and the max. So, we'd have to know those locations, the prices.
All right. So, next up, let's get the index of the max price. So, min-max prices, let's make a list of the min-max prices.
So, we have, you know, get the min price, get the max price, get the min-max prices as their own list, which we could do. So, if we wanted to wrap these in a list, and you'll see why we might want to do that, we're going to say min-max, we're going to make a variable called min-max prices, set it equal to a list, and then just feed in your min price and your max price. Actually, we kind of want these, we don't want to just output the min price, we want variables, want to set them to variables.
So, let's go ahead and do that. So, max price, we don't want to just get it and print it, we want to store it. So, variable.
That way, we can get, take the min price and the max price and output them as items one and two of a list, like so. Let's also get the index of the min and the max prices. So, we know that the min price is a 51.5 and the max is a 94.8, but what index are we talking about here for these? Where is the 51.5? We can go track it down, there it is, but what's the index of that? And what's the index of the max? Well, NumPy has a method that you can call on the NP on NumPy called argmax, which gives you the index of the maximum value in a list and also argmin, which will give you the index of the minimum value.
So, by using argmax, I mean, we use minmax to get the values, but if we want the location, the index of the values, we use argmax and argmin, which is not for Python, you have to call that on NumPy. So, we'll call that min price i and max price i, that being the index of the min price and the index of the max price, we'll say. Min price i equals np.argmin, feed in the list, stock prices, and max price is going to be the np, the argmax, and there we go.
We see that the min price is at index 26 and the max price is at 52, that makes sense? It's about where, there's your min, right? Sure, there's your max, 26 and 52. Now, why would we care? The argmax price, actually, we have this ready. Okay, so what we want to do now, let's make, just like we made the minmax prices as a list, let's make the minmax indices as a list.
Indices being the plural of index. So, we're just going to set a list, I'm going to declare a new list with two values, the min price i, the max price i, and why is it feeding us these, and this is like a new thing, honestly. Why is it giving us NumPy value? I mean, I get it that this thing is NumPy, it's a NumPy, um, can we listify this? Can we int this thing? It's more like it.
I don't want it to be there, there, there, there. We, okay, so I'm getting this NumPy answer here, I don't want that, because the result is NumPy, so it's numpifying the answers. Let me show you that again in slow motion.
So, if you don't intify, if you just do this raw NumPy move, then you try to make a list out of the two answers, it's giving you them as this NumPy reference, which we don't really want. So, what we're going to do is, we know these are numbers, are whole numbers, so we're going to int them, we're going to directly int the NumPy answer, and therefore the minmax indices list is just numbers, not with this NumPy wrapper. Why would we care? That little list, those are the xy values of the two min, two points we want to plot, the min and the max, where the prices are the y-axis and the indices are just like the days, that would be the X-axis.
So, the min price would be plotted at 26,515, and the y-axis would be plot, you know, the max point would be plotted at 52,948 in xy space. Let's make sure we get to understand that. Min point is, let's say this, we'll say min point, now xy coordinates, right? That's how it works in geometry.
We're going for plotting on an xy-axis, so the min point would be a tuple, really, of your min price index, which is your day 26, assuming one day per entry in the stock prices, daily stock prices, right? And then you would have your min price, and if we printed that, the min point would, what would be the data type of the min point? That should be a tuple. Now, we've used the term tuple a little bit here and there, but now we're kind of getting into it more with data science. In your pandas data frames, we always check the shape, right? With the students df, we had a thousand comma seven, that's a tuple.
That shape that you always get back, the chessboard, parentheses eight comma eight, another tuple. A tuple is an immutable structure, kind of like a list. It's got multiple items in it, and they are referenceable by index.
So, if you're bundling up xy points, you can definitely do so as a tuple, and the tuple with the parentheses looks like what we're used to with coordinate geometry. I mean, you could make a list too. Aha, it's a tuple.
So, that's the min point, and the max point, so the minimum point would be found on the xy axis at 26,51,5, right? The min point xy, and the max point in xy space, another tuple, but way down the line at 52. So, the zero to 59, the 60 days is running along the X-axis. So, way down on the right near the end at 52, you've got your peak, and kind of in the middle, you've got your valley, your minimum price.
So, we're going to plot the jaggy line for all 60 days of the prices, but then we're going to go in and to the min and max points and highlight them with a big star or something, so we can see the min and max points. So, argmax is the NumPy method that returns you the index of the min or max value in a list. Okay, get all that written, typed out.
You can pause, then you can scroll and type. I highly recommend you type every bit of it and study it, and then restudy everything. These files are your study materials until you really know it.
Now we're going to plot the list. We're going to make a plot. A plot is the lines, a line chart, typically used for time series, right? In the case of stock prices, it is a time series 60 days, where in a time series chart, your X-axis is your time progression by day, by week, by year.
Think about Bitcoin prices, stock prices, gold prices, whatever in the X-axis. The time progresses along the X-axis. The values, the counts, the prices, the quantities, those are the y-values.
So, PyPlot takes only the y-values as a list, and then it just, because the time series is just consecutive integers, it just automatically supplies those. So, PyPlot takes a list of y-values such as prices, quantities, units sold, etc., makes a line chart with values in the X-axis, the y-axis. The X-axis is auto-incremented, as consecutive ints, assumed to be a time unit, so you don't have to provide X. If you've got 60 prices, your X is just 0 to 59, because it's a march through time, one day at a time.
So, you don't supply the X to a line chart, just the y, so it knows how to zig and zag along the time, which is a given. The time equals the length of the list. Okay, so we're going to make a line chart of the stock prices for our hypothetical company here, or Acme Widgets or whatever.
Now, scatter, let's not do this scatter thing. Here's what we want to do. We're going to say plt, that's your PyPlot, dot plot, and scatter we'll get to.
Scatter is different. We're on this one right now, plt plot. In fact, maybe move this up.
Okay, just plt plot. Keep scatter down here, we'll talk about that when we get there, which will be soon. Plt plot, plt dot plot, and we're going to feed in our stock prices, which is a list, and just run it.
There you go. We see we had some kind of horrific sell-off one day. By default, charts don't really go any higher or lower on the value than they need to be, right? All the values are between 50 and 100, so you're only seeing from 50 to 100.
Now, this kind of representation might look like wild fluctuations, so what people who make charts do is they say, you know, numbers don't lie, but figures don't lie, but liars figure. That means people will, okay, how do I make that look not so dramatic to change? We'll just change the scale, so check this out. We're going to go, we'll say plt dot ylim.
Let's set the ylim from 1 to 100 now. Look at that. Now you're saying the minimum is a 0, and the max is 100.
Let's set it to 200. There. Now it just looks like steady growth with a little blip, so you can set the xlim and the ylim.
Let's come to the bottom, and we'll say plt dot show. That's kind of like putting a bow on things when you're done with your chart. Sometimes it gets rid of little scraps of text showing up in the chart, right? There was this little scrap here that you don't really need.
We can also show a grid, plt dot grid, and we can give it a line style of dashed if you want to have grids. That's sometimes helpful, and a title. We'll say plt dot title.
Got all that written out. I don't want you typing this tedious text, and we'll say plt dot X label, so we know what we're looking at here, what those days mean, what the time units mean, and even what the y values are. So there you go.
U.S. dollars, June, July, 60 days. Now here's what we want to do. We want to come in to the 60 days, and we'd like to put a little extra big dot.
Maybe make it a little more dramatically jaggy. Here we'll go zero, 100, or we don't even have to start at zero. We could do 20, maybe 20,100, or 21,20.
A little breathing room. Okay, I like that. 21,20.
So there's your minimum down that little valley, and there's your maximum. We want to emphasize those by putting little dots there, and dots are done with what's called a scatter plot, and in the case of our line chart, it's actually like, think of it as 60 dots with line connectors to make it look smooth, but we could make that as just 60 dots. So let's go in here and turn off the plot and say plt dot.
This brings us to scatter, which we can move up now. So in scatter, you have to provide X and y, so we'll provide X as a range from 1 to 61. I don't want to start from day zero, and then we'll have our stock prices.
So watch what we get. It's going to be many dots now, say, and we could set the size of the dots. Dots are, I think, let's say five.
Is it s? There you go. Yeah, make them smaller. Okay, so these little dots are a scatter.
So really, when you think about it, when you plot 60 items, you don't have the data for the connections. The connections are just straight lines between them, so it looks smooth. Like, you don't really know that the, you know, from one day to the next, how that angle should be.
It might, maybe it's, maybe it should be a slightly different angle. So what you do know from the 60 days of stock prices in our list, we know that, you know, we can plot them in xy space with the X values being 1 to 60 and the y values being whatever the stock price is at that position. So scatter takes X and y, whereas plot just takes y and assumes X. We don't really want to do that because we'd like the smoothness of the line.
However, we still kind of like to have scatter available. Just know that we could do it so that we can come in and just drop in. We don't want scatter for everything, we'd like to scatter the two dots for the min and the max, which we made.
We're going to say scatter. We're going to do scatter again, but we're only going to scatter. The thing we're going to scatter is not stock prices, it's going to be our y points, our min-max indices, our min-max prices, right, this.
Our y values for our scatter are those two prices, and our X values for the scatter are those two indices, 26 and 52, not every single one of the 60. And we'll make the size normal, which is pretty big, and we'll say marker, we can put in a caret, which is like a triangle, and we could say color, red maybe, there. Now it'd be kind of nice if the low one was red and the high one was green, so we could feed in a list of colors corresponding, red first, then green, maybe lime, it's hard to see that green, and the size can be bigger.
We have this marker, which is, in this case, we're doing a caret to give you a triangle. If you wanted a star, you could do an asterisk. We have size 40, the default size I believe is 20, and then we have color, so you can pass in one color or pass in two colors to go with our two points.
And just go crazy on the size, so scatter, X and y. What are we doing for our X? We're providing our min max prices. What are we doing for our y? Well, no, excuse me, our y values. Our y values for min, min max points, that would be your min max prices.
And your X values for min max points, plot the min and max points as dots using scatter, p-o-t dot scatter, scatter xy. We've got two plots superimposed on each other, really a scatter and a line. Now let me show you what this show does.
Show puts a bow on it and says you're done. If you go in between the plot and the scatter and do a show, you'll get them as two separate plots, which we do not want, right? So show is not just to clean up scrap text at the bottom, it is to indicate, put a bow on it, we're done. And since we haven't done it until we've done the plot and the scatter, we can have multiple plots in one xy coordinate system, which is what we want, right? So read this, note plot only takes one input y, scatter takes X and y, read, type.
You'll learn it if you do it all. If you don't do it all and just watch, I mean, probably by now, you know, we've been at it for a while.