Discover how Python sets can efficiently eliminate duplicates in your lists. Additionally, explore sorting, indexing, and performing vector operations to enhance your programming skills.
Key Insights
- Use Python sets to remove duplicates from lists by converting a list into a set, then back into a list, as sets store only unique values without maintaining order.
- Perform common numerical operations like sorting, summing, finding minimum, and maximum values directly on lists using built-in Python functions such as
sorted()
,sum()
,min()
, andmax()
. - Combine and repeat lists easily through vector operations, using the plus
+
operator to join lists and the multiplication*
operator to replicate list elements.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
This is a lesson preview only. For the full lesson, purchase the course here.
All right, now a set is a list-like structure, but with unique values. So items in a set are all unique. Items in a set are wrapped in curly braces, not square brackets.
Items in a set are not stored by any kind of index. Items in a set may appear in an arbitrary order, so if you turn a list into a set, it might scramble the values because it just doesn't care about the order. And if you take a list and pass it to the set()
function, it will return a set.
Now, if your only purpose was to get rid of duplicates, you could take a list, pass it to set()
, and then immediately pass that to list()
to return a new list of unique items. So that is a way to remove duplicates. This is the little trick for removing duplicates from a list and keeping it as a list.
So let's look at how that works. We'll say pets again, and now we've got multiple instances of "bunny" as a pet. Those pesky bunnies.
What we're going to do is purge the duplicates. So my_set = set(my_list)
returns a set from a list.
So let's just get the unique pets by making a set. We'll call it pet_set
—equals the jet set. All right.
set(pets)
, pet_set
. Okay, so only one bunny, right? Because sets contain only unique values. But it also kind of scrambled the items.
It put the cat at the end. It put the bunny between the dog and the cat. That makes sense.
I don't think the dog and the cat want to be next to each other. But it also put it in curly braces, right? It's not a list anymore. And what you'll notice too is if you try to access an item at, say, index 0, it's going to give you an error because a set
object is not subscriptable.
You can't look up items in a set by index like you can with a list or a string. There is no index, which is why it doesn’t care about the order of the items. Now, let's start over and get the data type of this.
We'll say type(pet_set)
. We'll know that it's a set, right? Yep. But we don't want a set.
The set is a stepping stone. The set is just a way of getting rid of the duplicates. We really want it to be a list.
So what you do is create unique_pets
. We're going to start with that. So pets
still has duplicates, right? And then pet_set
doesn't have any duplicates, but it’s not a list. We want to say unique_pets_list = list(pet_set)
.
Now, we could do this. We could say list(pet_set)
and then print it, and it will give you back a list, right? We passed the list to set()
to get a set with no duplicates, and then passed that set to list()
to get a list with no duplicates—which was the goal. The goal being to get rid of all duplicates and just have unique items. But you can do it all in one move, all in one line, using nested list(set(…))
, like so.
We're going to say unique_pets_list = list(set(pets))
. We're taking the pets list with duplicates, passing it to set()
to remove duplicates, and then to list()
to get back a list of unique items. And that works.
Sorting numbers is similar to sorting strings. We're just about done with this file, believe it or not. All right. We're going to say nums
.
Mostly we're dealing with numbers. When we get into the data science stuff, there will be a lot of numbers—although there are also strings, of course. Think of spreadsheets: lots of string values like last names and other data.
We're going to take the numbers and sort them. And we'll print the result. That gives us a list sorted in ascending order.
Okay. Let's pop()
by index. Let's get rid of the item right before 18.
We would have to find the index of 18 first. We're going to say index_of_18 = nums.index(18)
. We're going to find whatever that number is. The index of 18 is 4.
Now, if you want to get rid of the item right before that, you would remove the item at index 3. We'll say nums.pop(index_of_18—1)
. And then if you print nums
again, the item before 18—five—is gone.
Five was before the 18. Now it is gone. No more five.
It's gone. And there’s nothing we can do. It's gone.
We had a little problem at the house. It's gone. All right.
Insert "ferret" as the first item in the pet list—a little wrap-up. Okay. First item.
Now, how would you do the first item? Actually, this is not a wrap-up—this is a new method. We haven’t looked at insert()
yet.
append()
would put it at the end. insert()
lets you choose the index. This is part of our wrap-up in terms of methods. Well, we did copy()
earlier.
So this is—yeah—let’s put it back in the order that we did them. Okay.
The last one we're looking at here is insert()
. insert()
takes two arguments: an index and an item. So I want to go to index 2, say, and put in whatever—you know, put in a potbellied pig or something.
So, ferret. Fair enough. pets.insert(0, "ferret")
, right? That adds it as the first item.
Kind of the opposite of append()
. We're going at the opposite end. And you'll get a ferret.
Now, let's insert 0 at its correct position at index 2. We'll say nums.insert(2,0)
, right? So it’s going to go between -1 and 4. And there it is.
There's the 0 at index 2. There are some other functions—sum()
, min()
, and max()
—where you pass in lists and it'll return the sum of all the numbers, the minimum, and the maximum. If you want things like the mean or the median or the mode or all these other statistical values, we have to import libraries for that.
But just raw, out-of-the-box Python, we have sum()
, min()
, and max()
. So let's say sum_of_nums = sum(nums)
. The sum of the nums
list is 201.
Take their word for it. Let's get the min and the max. We'll say min_value = min(nums)
and max_value = max(nums)
. -3 and 78. Looks right.
There they are. Now, if it’s a sorted list, you could also just get them at index 0 and -1, right? You know what I mean? The min would be nums[0]
and the max would be nums[-1]
. But you'd have to sort it first for that to work.
So, min and max at index 0 and -1 respectively—only if the list is sorted in ascending order. Yep. Boom.
All right. A vector operation. Last thing.
You can add two lists together with a plus sign. Many other programming languages don’t support this directly. It’s called a vector operation.
A vector being a one-dimensional structure like a list. We can actually just add lists together. You don’t have to use extend()
, which we did before to append another list to pets
.
We could say—let’s call it more_pets
, and then print(more_pets)
. There you go.
Now what we’re going to do is add them together. We’ll say pets = pets + more_pets
. Now you've got all those pets.
Oh, we want unique pets though, right? We don’t want three bunnies. Whoops. Oh—it’s unique_pets_list
.
There you go. All unique. It’s not unique though, because I did it twice.
Restart session and “Run All.” That’s the third time I’ve had to do that—it is a good move to know. I ran the thing twice, so I got the pets twice.
All right. 10%. What’s up? There we go.
Okay, yeah. It’s going to make me do it again.
I don’t want to do the whole thing over because it’s going to make me type. That’s the problem—it’s going to make you type.
If you do have to rerun everything, it’s fine unless you’ve got input()
calls that’ll pause and require you to retype values. You can’t just get right back to where you were at the end. It’s going to make you do the inputs every time.
So I want to avoid doing them, so I’m just going to run through here. This kind of stuff happens, by the way, to the best of them. I’ve been doing this a while.
It’s just an inevitability. You understand what happened, right? I ran an additive cell twice, so I got extra pets, and the only way to really get rid of them was to “Run All” again. But in so doing, I was forced to input the food and the beverage on those previous challenges.
And once that was done, it went through an error—it just made more sense to do this. And we’re almost back. Vector operations can also be done with multiplication. So let’s say nums = nums * 3
.
There you go. You repeat the list three times. Get every item three times.
For pets, we’ll say more_pets = more_pets * 2
, let’s say. We’ll just double them up.
There you go: turtle, canary, goldfish, ferret, turtle, canary, goldfish, ferret.
That is a vector operation on a list—applying arithmetic operations to lists. You take a list, multiply it by two, and you get doubles. All right.
That—well done. Congratulations. That is finally the end of Lesson One.
The other lessons—Lesson Seven is really long—but this is the longest one until then. So congrats for getting over this hump. And that’s just variables and some operations on them, right? We haven’t gotten into the next lesson yet.
We’ll get into conditional logic, then we’ll get into loops. So—all right—hang tough. You’ve got to know this stuff.
There’s no two ways about it. So thanks for hanging in there, and we will see you in the next lesson coming right up. Or whenever you’re ready—we’ll be here for you.