Explore the transition from basic Python programming into powerful data science applications with NumPy. Learn how NumPy enhances Python lists, making data manipulation and analysis easier and more efficient.
Key Insights
- Introduces NumPy (Numerical Python) as a Python module that significantly enhances list functionality, enabling reshaping into multi-dimensional arrays such as two-dimensional spreadsheets.
- Covers techniques for auto-generating non-repeating random numbers using Python's built-in random module, simplifying the creation of large datasets.
- Demonstrates practical methods for slicing nested lists, extracting subsets of data, and highlights how these foundational skills set the stage for advanced data manipulation using NumPy.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
This is a lesson preview only. For the full lesson, purchase the course here.
Hi, welcome back to this course on Python programming for data science. We're now getting into the data science part. My name is Brian McLean.
Thanks for rejoining me. All right, the first five lessons are done. That was the core programming.
We did variables and data types, if-else logic, modules, loops, and dictionaries. That gave us a foundation in core programming so that we're in a good position to move forward to what Python is more famous for, perhaps, and that is data science. Data science being the loading, manipulating, aggregating, cleaning, interpreting, gaining insight from, and visualizing data, including large amounts of data.
So I'm going to copy file 06, NumPy. And what, pray tell, is NumPy? NumPy is short for numerical Python.
The NumPy module adds functionality to lists. It's kind of like having lists all souped up with superpowers. It enables lists to be reshaped into two- and three-dimensional shapes, like a spreadsheet of rows and columns would be two-dimensional.
All the lists we've been working with are just one-dimensional vectors. This two-dimensional format, this matrix, is the underlying structure of a spreadsheet. In Python, a spreadsheet is a Pandas data frame.
So let's begin by importing NumPy. And it's conventional to alias it as `np`. And we're going to import `random` again.
And import `pprint`. All right. So what we're going to do is begin by declaring a list of numbers.
We've got some numbers here. We'll say `nums`, then print the type, which we know to be a list, of course. There we go.
What else can we find out about this list? Remember, we can get the length of it. And we could also get the sum of the list. So there are 12 items, and they add up to 427.
We know this kind of stuff, right? We could also print the last three items, using negative indexing. Print every other item, if you recall. We could print every other.
We could print the items backwards. That’s a step of -1. We haven't looked at that. If your step is -1, it actually runs backwards.
So this is just a quick recap of lists. Now, we could also auto-generate. What if we didn't have the numbers? We want to auto-generate some non-repeating numbers.
Well, we could use `random.sample`. Remember, we used `random.sample` to get five unique lottery tickets. We could say `nums =`.
Instead of hard-coding the numbers or just having them happen to be lying around, we'll say `random.sample`. And we'll do `range`. We want the numbers from 1 to 100, but since it's exclusive, we’ll use 1 to 100. We’d like 12 numbers, just like our original `nums`. And let's print all that, see if it works.
There you go. That adds up to 566. If you run it again, it's going to change every time.
So `random.sample` is giving us a dozen unique numbers in our 1 to 99 range, as opposed to sitting there typing them. Now, let's make another 12-pack of numbers. But this time, let's bundle them into child lists.
And that's a very laborious process. We've got them. We'll call it `nested_nums`.
There are still 12 of them, but they're in little packs of three. So `nested_nums` actually has four items. `nested_nums`, right.
`nested_nums` is a list. The length should be four. I don't know if it's going to be able to do the sum.
No, it cannot do the sum. There we go. So it is a list, right? It can't drill in to do the sum.
So the nested list now has a length of four, because each little three-pack is considered just one item, of course. Now, we could also keep the same numbers in the same order and just bundle them into groups of four instead of groups of three. And it's really all the same numbers.
It's just the cutoff. Instead of four groups of three, it's three groups of four. Length would be three.
And you could do that as well. All right. So let's call that `nested_nums_2`.
All righty. Coming back to… Okay.
Let's stick with this one. `nested_nums`. Let's come back to… These examples are… Let's move that down.
Okay. So here's what we want to do. We're going to select items here.
If you want to print all, you just simply print the variable. Now, what about this little challenge? Pause. Try this.
Try to get what you see next to the print statement. So try to get the 45,51,24 little inner list and so on. Okay.
Here we are. So that would be the second item in `nested_nums`. We would say `nested_nums` at index 1.
Let's print a little break here. Okay. So there's that.
We got that. Now the next one, 51,24—it's the same one, except we only want the last two items or the second and third item. We could still say give me index 1.
And then, now that we're in that list, we'll say go from -2 to the end. There we go. Right.
Because that's how you get the last two items—-2 to the end. And the next one, 12,39—that would be the third little child list, first two items. So that would be the items at index 2, right? Index 2 gives you all of them, but we just want the 12 and the 39.
We'll say index 0 to 2. And lastly… No, that's it. Oh, just the 12.
How do we find just the 12? Okay. That would be `nested_nums_2`, item 0. Go to `nested_nums_2` and then get item 0.
Little recap. Now, that's just lists. Lists here are just a stepping stone for what we really want to do: segue into the topic of the lesson, which is the NumPy array.