Explore the essentials of filtering and sorting data with SQL queries to efficiently narrow down database results. Learn practical techniques to enhance query accuracy and productivity in your workflow.
Key Insights
- Understand how to effectively filter SQL query results using the WHERE clause, such as retrieving all orders associated with a specific user ID or products with quantities greater than or equal to three.
- Discover the significance of using arithmetic operations (e.g., price multiplied by quantity) to sort data accurately, enabling identification of the true most expensive items in your result set.
- Gain proficiency with aliasing columns for clearer output and learn the importance of distinct queries to eliminate duplicate entries, resulting in streamlined and precise data retrieval.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
This is a lesson preview only. For the full lesson, purchase the course here.
So let's talk through these challenges here. So again, the whole kind of thought here is that a lot of times when you start off with a query, you might wanna just quickly browse the whole thing, but you're normally not gonna wanna stop there. You're gonna wanna filter it down and only show the parts that you want to show.
So even if I say, I wanna look at the orders table and I look at the orders table kind of quickly, I'm not gonna stop with this query. I'm gonna wanna filter it to say, I don't wanna see all of the orders. But somebody might say, well, hey, I wanna see all the orders for a particular person.
And in this case, I have unique user IDs for each person. So looking at the orders, I don't have the name of the person or anything in this table. Later, we'll have to see how we can join tables together to get the name of this person.
But for right now, if I just know that user ID number 30, I wanna pull all the orders for that person. That means that when I look through the user ID, notice how this same person placed both of these orders. Two different orders placed at different times, potentially shipped to different places.
I wanna find all of the orders for a certain person. In this case, user ID number 30. So I'd be looking to see in that column to see where this user ID column equals 30, which is a number, so I don't have to put any quotes around it.
And these will be all the rows. So every order, so in this case, five orders. And I can see their orders, where they were shipped, when they were placed for this one particular person.
By the way, when we're running our queries here, yes, you can click the little play button there. But on the Mac, you can hit Command + Return to run the query, or on Windows + Control + Return to run the query. So yes, you can click the button there, but Command + Return, or on Windows + Control + Return.
You're gonna run queries a lot. So getting used to that keyboard shortcut of doing Command + Return, or Control + Return on Windows, you can just click into a query and hit that keyboard shortcut. All right, so I wanna select all the columns.
So I wanna see all the columns in line items. So I could start with just showing myself all of the stuff that's in the line items table for right now, just so I can see the data that's there. But I don't wanna see everything.
I just wanna see where people purchased a quantity of three or more. So what do I add to this? Where? Quantity, greater than or equal to three. Yep.
Now, notice that my quantities here are five or 10. It's possible that on this website that there's a preset list of like one, two, five, and 10. Maybe they don't allow people to choose an order of three quantity.
Maybe, maybe nobody's just chosen that. Just because you look for something doesn't mean it's there necessarily. There could have been quantities of three, four, five, six, seven, eight, nine.
But just because you're looking for it doesn't mean you'll find it, right? Like if you're looking for customers from Florida, what if your result set was empty? That would tell you what? No one's from Florida, right? An empty result set could be an answer. It might not be the answer you thought you'd find, but an empty result set, as long as your code is correct, it could say nothing gets returned. You have no customers from Florida maybe, right? I can look in my bank account for a million dollars.
It doesn't mean I'm going to find it. You can look in your database for something and it doesn't mean you're going to find it. You might not have customers who meet those criteria.
So your end result set might be a big fat nothing. Like what if, did anybody order in a quantity of 30 or more? Nothing, nobody ordered 30 or more. I now know something, right? No error in my code, it just, nobody ordered more than that.
10, yes, people ordered in 10, but nobody ordered more than 10. Like if I look for more than 10, nothing. 10 was the highest that I had here.
I want to see, so I'll put this back to the correct answer there. I want to see all the rows from line items that have a price less than $30. If these are all of the line items, I only want to see the line items that were less than $30.
Now we are being specific here. We are seeing, say less than $30. I think when people ask you a question, I think it's good to maybe just think, does it matter if I include that number or not? Like for example, if I say, if I'm exactly what they say, less than 30, then that will be not including 30.
Oh, that's supposed to be where, sorry. So this is everything less than 30. And if I had a product that was $30, which I don't believe I do, but let's just see.
Nope, no products that are $30. So for me, it wouldn't really matter if I did less than or equal to 30 because I don't have any $30 products. But you never know in the future if maybe I would get a $30 product.
But if somebody is asking you for something and they're speaking plain English, not like a programmer, they're not gonna use less than or equal to $30. They're not gonna use that kind of language because they're not programmer type people. So you do always want to question, do you want to include $30 products in there or not? I think a lot of times when people say it, they probably mean to include those things, but this is where don't just be literal about how a non-programmer speaks.
If you're not sure, go back to them and ask them more questions so you understand the intent of their question. So in this case, it doesn't really matter, but I would want to talk to the person just to be sure. Now here, if I'm looking for $30 or more, that would be greater than or equal to 30.
These would be everything that's $30 or more. Now, if I just move on, did I remember everything? Oh no, it also says ordered by most expensive first. When you do something, especially when you're first getting used to SQL, I find, at least for myself, I would get caught up in doing something and I'd make sure it was right.
And I'm like, yay, yay, I did it. And then I'd forget to go back and reread the original question. And sometimes I forget things that they mentioned because I was just glad that I got one thing fixed or working and then I would forget to go back.
So always when you're done with a query, go back and reread the person's question and make sure that you answered every single part of it, that there weren't anything that you forgot. Ordered by most expensive first. So how do I do that? Order by price, which if I just did that, the cheapest would be at the top.
So as you said, I need to add descending to put the most expensive at the top. Now, actually, let's think about this for a second. I know we didn't tell you this, but sniff test.
We did say the most expensive, but when I said that, this is the most expensive price. If we're just factoring in price, is there anything that we're missing out of this? Quantity, yes. So this, for example, this line item is worth less, way less than this, because there were five of those purchased.
If you're only looking at the price of the product, you're missing out on the value of the line item. So truly, to calculate this, and this is a little bit sneaky, we didn't say this, but if you don't factor in quantity, are you truly putting the most expensive line item first? Wouldn't we have to say times quantity, so it can do the math for you? That is how it would actually order. And oh, look, some of these, they ordered 10 of that item.
And here, even though this product price is cheaper, because they ordered 10 of them, the value of that is way greater than the value of maybe something else, which has a greater price. But down here, some of these were only purchased in a quantity of one, even though the price might be higher. So you can actually do math here.
And what's interesting is, I don't even show people this math, but it's still doing that math and sorting by that math. Interesting. But this is where I say, what you're creating here in the result set, and what you're doing down here, this is not just limited to what's in the database.
For example, could I create a column that shows me price times quantity? Yes. I can add a column called price times quantity there. It's not a column that lives in my database, but yet I can use, because comma means create a new column.
Asterisk by itself means all the current columns that are in the database. But when you put it between two columns, that becomes multiplication, because you can add, you can subtract, you can divide, you can multiply. So price times quantity creates a column that did not technically exist, but you can actually see it if you want to.
Do I have to show people who? No, I can still order by something that I don't see. It'll still calculate it on the back-end. But yeah, SQL can do math and do calculations for you.
Comma says, do another column. If you were to say, putting price, let's say I put price. And let's say I put quantity.
And then I put price times quantity. I'm saying put the price column, then put another column, then put another column. So the commas separate your columns.
And this column is created by price times quantity. Notice there's a question mark here for the column. They're like, what's the name of that? Because does that column exist in my database? No, it's something I'm creating using information in my database.
Of course, I might want to put a nice name on that. So I could say, let this be known as, and I can come up with my own name. Maybe I call this value.
And that is just a name that I'm coming up with. This is called a column alias. A column alias is, like, if you have an alias, you're like, I am also known as.
Like, technically, my name is Daniel, but I am Daniel as Dan. Like, I go by Dan. Like, yes, I'm Daniel, but I go by Dan.
That's my alias. Now, some people go by a whole different alias, like they have fake IDs, right? That's their alias, right? So this is just, hey, I don't like the default name. Let me give it another name.
Like, you could technically rename any column. If you don't like price, you could call it as product underscore price. So you can technically rename any column to be whatever you want, even if you just don't like the name.
But when there is no column name, because there is no column name for when you do math, then you want to give it a nice name. And when we're doing strings, strings use single quotes. Double quotes are reserved for aliases.
If you called this line item value and you wanted to put quotes around there, you'd have to use quotes because when you have spaces, you can't have a name with spaces unless you quote them. So technically your names can have quotes if you want to make like nice looking names for your columns. And you could do capital letters here if you want.
Line item value. And you can capitalize that. If you want nice, pretty names for your columns, you can give it a different name if you want.
And the double quotes say it's a column alias as opposed to single quotes, which are for strings when you're looking for values. In a where statement. So I know we didn't say that, but I just wanted to kind of get us to be thinking bigger picture here and to be able to look back at our queries and question ourselves.
That sniff test. Wait, did I truly understand it? Did I properly answer that? Speed is great as long as you're correct. But I don't care how fast you are if you're wrong.
Always look back at your query and say, did I get this right? You're happy that you didn't get an error in the beginning. You're like, yeah, I didn't make a typo. But you have to look and see, did I give it the correct answer? It's really important that we're correct in our stuff.
So I know it was a little bit of a tricky thing and we didn't explain all of that, but that's part of why I wanted this as an example to say, hey, are we really doing this the correct way? We didn't technically require that. Limit the results to see just the top 20 most expensive. If I'm seeing all the columns from line items, these are all of the line items.
If I am ordering by just the price, I'd be sorting by that. If I want the most expensive at the top, I'd have to have descending to put the most expensive at the top. And if I just want to see the first 20 rows, I would need to limit my results to just 20.
20 rows so that I don't have all of those rows. Again, technically speaking, do you just want the top 20 most expensive by price? Or did you want it price times quantity? That will give you very different results if you include the quantity for the value of the line items. Because that factors in the quantity there.
Now, if we want to find orders shipped to one of these zip codes, looking over here that my zip code is a string, it's an A-to-Z, I'm going to use single quotes, not doubles, because we saw double quotes are something different. If I get everything from orders, these are all the orders. But I don't want to see all the rows for all the orders.
I just want to see ones that have a certain zip code. So I want to see where the zip code, and there's a couple ways we could do this. One way is I could say, let's just find the first zip code, which is this one.
And I could see as many things that were shipped to that zip code as I can. So this could be many rows, but we only have one order. Or I could say that the zip code is this other zip code.
Notice there were two orders shipped to that zip code. This is one way I could do it, but there is a more efficient way. Does anybody have a more efficient way of writing this? In, yes.
So is it in this list? And I can use parentheses to have a list. So is the zip code in this list? Is it equal to this? Or is it equal to this? Both of those queries will do the same thing. But obviously this is a lot less writing.
I will always say the most important thing is that you're correct. Both of these queries are correct. But obviously this one's less code.
So I would personally prefer this one. Do I find that zip code in this list? If so, include it in my result set. So this is the one that I would say is best.
But this one works perfectly fine. So why doesn't between work? So between works for ranges, like numerical ranges. So you can be between 1 and 10, right? So this is text, not numbers.
If zip code were a number, then it could work with between. But also I don't want all the zip codes between those. I just want this one or this one.
I don't want a range of all the zip codes that are from that to that, which would be a lot of zip codes. But it also doesn't work because it's text. It's not numbers.
You can have a number range, like between 1 and 20. But these are not numbers. These are texts because they're stored as text.
And that's not a mistake because if I browse this data here, the reason it needs to be stored as text is because of these dashes. Dashes are not numbers. Dashes are text.
Also, some of these zip codes start with a zero. And if you store a number as a number, preceding zeros are eliminated because you don't need them. But I need to have them here.
So there are very good reasons as to why they store this as a strain. But this is why you always have to look at what data type you're given, not which data type you assume it should have. Always look at your column and simply say, what do I have? What was I given? Whether there's a good reason for it or not does not matter.
Although there was a good reason for this. It's text. So I must use quotes, single quotes.
And that's why between won't work. Between could work. If I want to get orders from users 10 through 20, I could do between 10 and 20 because those are numbers.
You could just click anywhere in a query. And when you run it, it runs the whole query. So here you can highlight something to only run that portion of the query.
But if you just click any line of the query, it doesn't matter where you are, it'll run the whole query. It's smart enough. And this is where you do it to be careful because if you don't have semicolons, this will error out because it doesn't know that this query is done.
And this query is another query because I put them all next to each other. If I put an extra line space, then DBeaver is smart enough to understand that because it was separated by a line space, it figures out they're two different queries. The best thing is always though, of course, to have semicolons.
Because then even if you don't have space, it'll still work because the semicolons tell it that that's one query and that's another query. This is why sometimes you can get away that if you have the extra line space, it understands that that's a new query because of the space. But the best thing is always to have a semicolon so it knows that that's one query and that's another and you just click anywhere in it to run that query.
Now in here, we've got Mr. D Lubovitz twice and Governor once. I only want to see a unique list of their names. I only want to see names.
And so if I only see the ship name, notice that I have some duplicate rows. I don't want to see the duplicates. So I add what? Distinct.
Because distinct says get rid of duplicate rows. If you don't limit it to the name, if you kept it on asterisk, is this row the same as this row? Not exactly. I know it has duplicate names, but this is a different order than this.
So I don't get elimination of rows because this row is somewhat different from this row. I have to make it specific to the name that I'm only looking at names to eliminate duplicate names. Remember that distinct says eliminate duplicate rows in whatever data you see.
If your data is just one column, you can very easily eliminate duplicates in just that one column. But the more columns you add, you're going to get fewer duplicates because you're adding more columns. There's more opportunity for things to be different.
You cannot apply distinct to only one column when there are multiple ones shown, right? So like, for example, if you're showing all your columns, distinct still does its job. Eliminate duplicate rows. You can't say only eliminate duplicate rows in a certain column, but still show all the rest.
Because that would make that one column shorter than all the rest. And when you look at a row, it wouldn't make any sense. So distinct simply says eliminate duplicate rows, which we did.
Just, I want this to only be for names. So I just want to see a list of names with no duplicates. Now I've got my list of non-duplicated names.
So those are some bonus things. But basically, we've kind of done those types of things. So I want to continue on here.