Dive into SQL and learn how to efficiently filter, sort, and refine database queries to quickly find the exact data you need. Discover practical methods for transforming everyday questions into precise SQL statements.
Key Insights
- Use the "SELECT * FROM" query as a starting point to gain an overview of database tables, and then apply "LIMIT" or "ORDER BY" clauses to filter and sort the data more specifically.
- Employ the "DISTINCT" keyword to eliminate duplicate rows, enabling you to accurately identify unique entries—for example, identifying that orders have shipped to all 50 states, while users come from only 43 states.
- When constructing SQL queries, always refer to the exact column names defined in the database and confirm accuracy through visual inspection or "sniff tests," to ensure the results align with expected business logic.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
This is a lesson preview only. For the full lesson, purchase the course here.
Okay, so let's start talking through these challenges that we have down below here. We want to write a query to view the entire users table. So how do we do that? If we want to see everything that's in the users table.
SELECT asterisk, all columns FROM users. Normally, this is a nice thing to just kind of get started, just to kind of get to know your data a little bit, but it's never going to be your end query. Because we don't want to see all of it.
We're going to narrow things down. And everything we do from now on is going to be about filtering stuff down. Think of it like a funnel.
This is all the data, but do you really want all of the data? So how do we start to filter things down? How do we get fewer columns or fewer rows or get exactly just what we want among all of this stuff? Now, if we don't want all of the rows, if we just want the first five rows that are here, what do I add to this? And where do I add it? LIMIT five. Okay, yeah.
After FROM users. Yes, so we have to put LIMIT five to limit the number of rows to five rows, and we put it after the FROM. Now we have just whatever the default sort order is; if you do not specify a sort order, it's in whatever your result set is; it just gives you the first five rows.
If we add a sort, if we add an ORDER BY, then that would change which five rows we're going to see, right? If we want to view the users table ordered by when users were created. So let's say I'm viewing all of the users here. And if you want to see which columns you have, browse over here in your database navigator here, see which columns you have.
Notice how there's a created_at over here. That's got time. It's got a little clock.
We know that's when users were created. If I look over here, see how we have created_at, we got some 2019s, some 2020s, 2021, and so on. If I want to order my users by when they were created from the newest to the oldest, what do I add? ORDER BY.
Created_at. Because if I just run this, this is going from oldest to newest, but I want from newest to oldest, so I have to flip it and add DESCENDING. If you don't like the default, just add DESCENDING, and that will flip it.
Now my newest are at the top, 2022, going down to my oldest at the bottom, 2019. Now, if I want to view all of the orders table, it's called orders over here. I simply say, SELECT all of the columns FROM the orders table and not limiting the number of rows or anything.
So I'm seeing all the columns, all the rows, just so I start to get a sense of what's there. But maybe I don't want to see all of these. Maybe I just want to see the name and the state.
What do I add to just see the name and state? Ship name, comma, ship state. And that's what I, it wasn't really a trick, but sometimes people are like, oh, let me just type in name. But is that the name of the column? No, it's ship name.
Don't guess your names. These names are created by someone who created the database. Use the names that are there.
Now, if you know it says name somewhere, it does code hint name, and I can choose that by hitting tab. If I want the state, I can start typing state and it reminds me ship state. So look at your code hints.
It tells you. Also look over here; keep this open while you're writing your queries. Now I can have the ship name and the ship state.
So I can see the name and states of people. These are where we shipped the orders to, as opposed to where those people live. Now, when we say view the 10 most recent orders, notice how I'm not telling you how to do it in SQL, not telling you the code, because normal people in your company are going to give you normal questions.
They're going to say, can I see the 10 most recent orders or something like that? They're not going to say, oh, hey, there's an orders table. Can you give me the 10 rows sorted by… Normal people don't speak SQL. They speak in normal English.
So part of our job here is to translate normal questions into SQL. We're using SQL to answer that question. If I want to see the 10 most recent orders, what do I do? What do I put here? SELECT all the columns FROM orders.
LIMIT 10, yeah. LIMIT 10. So that is 10 orders.
But is that enough? No, because that's just 10 orders. I want the 10 most recent. Notice these are from 2019, because that just said without sorting them, show me the first 10 rows, but I want the 10 most recent.
So what do I add and where do I add it? ORDER BY created_at DESCENDING. And then LIMIT. Now I have to sort by.
So the LIMIT 10 was good, but I need that ORDER BY. If it helps, you can put these things on different lines. So you have kind of each part of the query.
It's okay to put on one line, multiple lines, however you want to break it down. But a lot of times I like to break my queries down, kind of each keyword on a different line; kind of makes the query easier to read. Also, once you're all done with a query, I always do what I call the sniff test.
I look around and like, does something smell off? Does something seem off? So, for example, when I just ran this part and here's a cool thing. If you select part of a query, you will run just that part of the query. If I don't select the order or limit, and I just run this, this is the whole table.
SQL only runs what you have selected. If you don't have a selection, it'll run the whole query, but if you select just one part, it'll just run that one part. When I first did this with the limits, let me just comment out this ORDER BY.
So here, these were from 2019. If you're looking at this data, if you're looking for the 10 most recent orders, you're like, nobody's ordered since 2019. Something smells wrong here.
Like, how's the business working if nobody's placed an order in the past like five years? That's a problem, right? Something smells off. So always double check your final result to make sure that it seems correct. In our case, the database only goes up to 2022.
At some point, we'll have to go and update all the dates, but yes, in this case, we top out at 2022. Now, we want to use DISTINCT to see which states people are from, not where we shipped orders to, because you have to be careful about the right kind of information. Orders are shipped to a state.
So users are from a state. If you think about people asking questions about states, do we care about where we're shipping orders to or do we care about where people are from? You have to make sure you're getting the right data. We're looking for where people live, not where they're shipping to.
So I just want a list of states where people live. So what do I put in here? SELECT DISTINCT. Yeah, I'm going to come back and add DISTINCT FROM users, right? States.
Okay, so let's say I SELECT all the stuff FROM users, right? I got all those columns. I'm really only interested in the state part of this. So here's the state.
If I put in user_state, now I'm just seeing that one column. What I like to do in this process here is I like you to understand what each piece of code does. When I see people, when I was first learning SQL, I would see experts just type in the whole query and they just hit right in the middle of it.
And it was like magic, like, wow, that works. How the heck do you know what each part does? So as I do things, I like to explain what each part does so you can actually understand what each piece of code does. And each piece of code essentially keeps filtering down more and more.
So right now, if I'm seeing every single user, every single user is from a state, right? I have 100 users because I have 100 rows. If I'm in the users table, keep in mind, I have 100 users, right? So every person is going to be from a state, correct? But people are from the same state multiple times. If I'm only showing the user_state, I'm seeing 100 states because I have 100 users.
But in the US, we only have 50 states. So clearly, some of these rows have to be duplicates of each other, correct? If I add DISTINCT, it says eliminate those duplicates. Now I'm down to 43 states.
So that means we have users from 43 out of the 50 states. Okay, so we don't have people living from every state. Although if I looked in the orders, if I looked in the orders table and I looked at our ship_state, and I looked at that, notice there's 50.
We have shipped to all 50 states, even though users are from only 43 of those 50 states. That's why it's very important to know which table we're querying because we're talking about different types of information here. Now, our little extra credit here.
We can use DISTINCT, which eliminates duplicate rows, even when you're showing multiple columns. So let's say we're in the orders table seeing which states each user has shipped to. So let's say we're viewing the ship_state, but if I'm looking at which states each user has shipped to, so let's think about this.
Let me get rid of this DISTINCT for a moment. And let's say I have in orders here, I have a user_id. That's the unique user ID, the unique number for each person.
And if I'm just looking at the user_id, so user number two, user number one, user number six, I don't have their name here because that's in the other table. I just have their unique user ID. Let's say I ORDER BY user_id for a moment here.
If I ORDER BY column number one, which would be the shorthand way to do it, or if I ORDER BY user_id, which I think is the better way to do it because I think it's more obvious. User number one has shipped to Vermont a lot, but they did ship to New Mexico. They probably live in Vermont, and once they shipped to somebody else, a friend or family or something.
So looking at this, is row number one a duplicate of row number two? Yes. If I were to add DISTINCT to this, DISTINCT eliminates duplicate rows. Now notice that there's only one row for Vermont because I don't need to see it over and over and over again that they keep shipping to Vermont.
I just want to see it once. Now, could user number two also ship to Vermont? Potentially, but that wouldn't be a duplicate if it was because user number one shipping to Vermont, I want to see that. User number one also shipped to these other states as well.
When you say DISTINCT, it says, is this row a duplicate of this row? And if they are, if you have entire duplicate rows, it'll eliminate those duplicates. DISTINCT is never applied to one column or another. DISTINCT very simply says, eliminate entire duplicate rows regardless of how many columns you see.
It looks at the entire row, and if the entire row is a duplicate of another entire row, that's when DISTINCT will eliminate those duplicates. You cannot apply DISTINCT to one column and not another. Think about the result if you did.
What if you only saw user number one once? How would you have three rows, but a user only once? Whenever we look row by row, it has to make sense. I need to see what user has that state. Oh, user number one shipped to that.
Or user number two shipped to that. What would it put there if we, let's say, applied DISTINCT to just user_id? What if we only had one once, two once? It doesn't make sense. We can't have some columns be shorter than other columns.
Now you might say, well, but in Excel I can. And I'm like, but are we in Excel? No, we're not, we're in SQL. So SQL has to have the integrity of row-type data.
So DISTINCT must only eliminate duplicate rows, not per column, just per row. Compare rows across each other. So the more columns you add, you're going to get fewer and fewer duplicates, right? Because as you add more data, there's more opportunities for one row to be different from another.
So the more rows you have, you're probably going to have fewer and fewer duplicates. So you're going to get more and more rows. So each thing does one very specific thing, only what you want.
And you have to understand what the SQL does versus what you think it should do. So, yeah, and we already did the sorting because I had the ORDER BY. Okay, I'll save this file.
Not going to come back to it, but I'll just save it just so I have it in case we want to come back to it.