Solution - Subqueries in SQL

Prefer joins over subqueries when both options are available.

Discover when to choose subqueries or joins in SQL for optimal database efficiency. Learn how to simplify your queries and enhance performance by selecting the right approach.

Key Insights

  • Prefer joins over subqueries when possible due to their simplicity, readability, and faster performance, as joins process data simultaneously rather than executing nested queries.
  • Use subqueries when necessary to compute aggregate values, such as finding products that cost more than the average price listed in the products table, since this scenario cannot be achieved through joins alone.
  • Recognize that primary and foreign keys are commonly used in joins, but not strictly required; joins can also match tables based on any shared, unique data without relying on defined key relationships.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

Now, one thing I'll repeat that I said earlier is sometimes some subqueries can be written as either a subquery or a join. I'm not saying all subqueries can be a join because if they could, we wouldn't need subqueries. But sometimes you can do it either way.

And if that's the case, I would just choose the join because normally joins are going to not only perform better, but they're going to be simpler, easier-to-understand code. So, don't try to use subqueries unless you use them when you need to, right? So my first choice would always be to go to a join if a join is going to work for me. If I need a subquery, I'll use it, but I'm not going to look for excuses to use it.

Kind of like wildcards, I don't try to use wildcards just for the sake of using wildcards. I use them when I have to. I only use subqueries when I have to.

So, select the products that cost more than the average. If we think about finding products, we think about, first of all, not being in game shows, so I need to switch my game shows database over to company data. Double-click on my public over here on the right.

Now that I'm back in, God bless you, back in company data here. I've got my products there. So, in here, I've got my products.

SQL Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

When I think about costing more than the average, there are two places where product prices are. I've got them here in the products table, of course, got price right here. I do also have in line items, I also have a price there.

The difference is products here, those are listed for sale, but I haven't necessarily sold them on our website or application. They're just available for sale. Line items, those are products that we have sold.

So even for thinking about calculating the average cost of a product, you'd have to think about, do you want the average price of products we've listed for sale, or do I want the average price of products that I've actually sold in line items? Because I could use either one in this situation, it would just depend on what you want to do. Let's say we're choosing the prices, we're listing them here. So let's go with the average price of the products that are just in the products table.

If I want to find products that cost more than that amount, essentially I want to be here in this products table saying where the price is greater than a certain amount. But how do I know what that amount is? Like if I just arbitrarily throw in a 50 there, that'll be everything over $50, but this amount here needs to be the average cost, which the average cost requires an aggregate function, which normally would go into the select. But if I throw an aggregate function, like let's say I do something separate here and I say, let me go into my products table.

And if I'm looking at price, there's all those prices of which I can look and say, what's the average price? And that gives me an amount. But I can't add anything else to this query because that's the point of this query. Oh, but that's the answer that I want.

So that's what I want to put here. So I can put parentheses there so that I can stick this query right there. And then now it will be prices more than 53.15 and so on.

So in this case, I need to use a subquery. There's no other way around it. I need to use a subquery for something like this.

There's no way I could do a join to do this because there's no table that has that amount that I can join in with it. I need to subquery for this. This one's a single value here.

I give it a single value. Find all the line items. So I'm looking for line items in my line items table.

So these are my sales. And I only want to find them if the price is for the cheapest priced product. Now, again, I could look at the cheapest price product in the products table.

That would be the cheapest product available for sale. There's no guarantee I've actually sold that, by the way. If I'm looking for the cheapest product that's sold, that would be in the line items table itself here.

So again, you have to think about, well, do I want to see the cheapest product and then find line items for that cheapest price point? Maybe that result set will be nothing because maybe I haven't sold those yet. Right? You always have to be prepared that a result set could be empty and that's not necessarily a mistake. That could be your answer is you haven't sold those and that could be an answer.

Let's say we're going to look in the products table and we want to find the cheapest product and we want to find line items for that cheapest price point. If separately, I said, Hey, let me go into the products table and out of all those prices, I want the cheapest one. That would be the min price.

That's the cheapest product that we currently have listed for sale. If I want up here to find products with that price, I could say where the price equals and in parentheses, put in this query. And so we have actually sold things with that price again, in this case, need to calculate that.

We need to use a subquery for something like that. Couldn't do that through a join. Find the oldest order made by someone with a Yahoo email address.

Now this is one where we could do it through either a join or subquery and I want to show you both and then you can see why, in my opinion, I think the join is the easier one to do. So let's take a look at how we would do it both ways. So let's say we're looking for orders because we want the oldest order and we're looking for users.

At least for me, a lot of times I like to make these separate queries and then kind of put them together. At least for me, that helps me. I'm a visual person, so I like to see the data.

I can start to imagine it more and find connections between them. Now you might think differently. It's just how I do it.

If I'm looking for the oldest order made by somebody with a Yahoo email, so or here in users, I can say, let me find the people who have a Yahoo email. So people who have something at yahoo.com. So these are my Yahoo people. Now, try to find some way to connect them to orders because I only want to see certain orders.

Here I have a user ID. In orders, I also have a user ID. That's really the only common data that's in both places.

And if I was looking for certain users in the orders table, I would have to say where that user ID. And if I was just looking for one person, I could just say one person. But I know there's multiple people with a Yahoo email, so I'm not just looking for one person.

So I can't use equals. I have to use in because in accepts a list of as many people as I want. So I can find many people.

If this is a list of numbers of user IDs over here, I've got to get this to be a list of the user IDs, which I can do by showing the user ID calling to focus on just that list of user IDs. Now, I've made my results that look like what I want to create over there. And so I just need to replace these actual hard-coded numbers with my query up there.

And now, 19,38, so on, 19,38, those are the correct people. The limitation of doing it this way is I'm only showing orders. So I can't see the person's email, right? I'm just only looking at the orders statement.

I know these are Yahoo people because I found them by their user ID, so I know it works. But one limitation of this is I can't see their email address. I can't see where that user lives compared to where they ship the stuff.

Because all I said is I just want to see the orders statement, and I want to filter it to only show rows that are appropriate. If you compare this to doing a join instead, we already saw that user ID was a connection between those two tables, which you can also see when you open up your tables and you look at your entity relationship diagram, you can see that user ID is a connection between those two tables. We could just join on that, right? If I take my users table and join my orders table, which is the two tables I was working with up here, I would join on that user ID.

In the users table, it's called user ID. And in the orders table, it's also called user ID. And this way, I would have all my user data here on the left.

I would have all my order data over here on the right. And if I've got all that in place, and I've got the email here, I can still do the same where that I had up here. And then I'm done.

Like I just joined the two and then filter. I've got all the user data, all the order data. I don't know about you, but I think that's simpler.

So it's less code. It's only one query. It should perform faster.

So in this situation, you could do it as either a subquery or a join. If that's the case, personally, I would always recommend doing the join instead. Cool question.

So for the joins, for example, we use the foreign key and primary key to find the common value. Is that kind of like the same thing with subqueries where we're using user ID with user ID again? Would it always be the same thing? In this case, yeah. Because it's the common tie that binds them together.

Yes. But there are cases where they might not be the same for subqueries. So you always, so let's back up a second.

So what you said is that we're using primary and foreign keys. Right. So for joins, don't ever think that you need primary and foreign keys.

Okay. They are commonly used because primary keys connect to foreign keys. They are literally connections.

If you have primary and foreign keys, we use them. But I just want to make it clear that we don't need primary and foreign keys to do a join. If you have primary and foreign keys, that is how you will join.

That is how you will usually make connections in a subquery. So it is correct that if you have a primary and foreign key, that's typically how you join and connect subqueries. But I just want to make it clear that because in the previous example, when we were doing that Jeopardy join where we created with a subquery, there were no primary and foreign keys in that transaction.

We were just looking for like data. Right. So primary and foreign keys are absolutely not required to do a join.

If you have them, that's typically how you will do a join. Right. But don't only think, oh, I must have a primary and foreign key to do a join.

That is not a requirement. That's just the most common situation. Right.

Thank you. You can still totally do joins. As long as you have data that's like each other, you can join on that.

And it's unique data that matches. And that's a good way to match rows. In terms of the last part, Zachary.

Do you want to just look for the thing that's created last? Oh, yeah. Right. Yes.

See, remember I said before, remember how I said when you're done, make sure you go back and reread your question because you can forget things. I say that from experience because sometimes you get caught up on doing something. I'm glad I did this in front of you guys because it happens.

You got to go back and like, oh, yeah, I just found the Yahoo people. You're like, yeah, I'm done. No, we wanted.

You're right. We wanted the oldest order made by that Yahoo person. OK, so.

So actually what we wanted. There. So let's see.

So oldest order made by a user. We have found the orders, all of them. So then how do I find the oldest one? Yeah.

Because I found them all. Now I just need to limit it to just the most oldest. So create that.

Right. Puts the oldest at the top. And then how do I just see that one? I just want to see the oldest like I just want to see one row.

We've done this a lot. Limits. Yes.

Just limit it to one row. Right. Just show me the top most row because I ordered it and I put it at the top.

So let's just limit it to one row. Which I can do here as well. Oh, created at is ambiguous.

Why? Because created at is in both users and in orders. So actually, this is a really good thing to see. It is very important as to which one I use.

So do I want created at? Do I want it from orders or users? Orders. Right. Because I want the oldest order.

I don't want the oldest user. I want the oldest order. So I need to make sure I say pull that from the orders table.

And so this should be. The same thing. Yep.

These are 19. 2020. Let's see.

  1. Oh, I'm looking at the wrong thing. That's the user date.

Where's the. Yeah. This one was the.

Got to go all the way over to the order. Yeah. So 2020.

  1. 1. 2020. 3. 1. Same where. Yeah.

Just when you look at this stuff, don't get confused. This is the. This is when the user was created.

This is when the order was created. Because this is the order data right here. This is the end of the user data.

So users have created at the end. So that last one. That's that's the tail end of the user data right there.

So. So how would you find the oldest order for each user? So you want to see a list of users. And the oldest order.

You want to see a list of things and you want to do an aggregate function. Per. Aggregate function.

How do we do that? Group by. Yeah. You do a group by.

Yeah. We do. We'd have to do a group by.

List the titles of products that were returned in quantities greater than four. So where are my titles? My titles are in products. Got titles.

But returned in quantities greater than four. That's over in. Line items.

That's where my sales stuff is. That's where my status. And my quantity is.

So I could say. Filter this down and only show me where. The status.

Is returned. Now I've got returns. And.

The quantity. Is greater than four. Okay.

I don't have the titles there. Because they're in products. Now, if I'm looking for certain products over here.

How can I kind of connect to these things? By product ID. By product ID. Yeah.

Because I got product ID and line items. And I got product ID here. You can always remind yourself if you look at tables.

If you have primary and foreign keys. You can see. How those things connect.

So. If I was looking for. Where.

Product ID. Am I looking for just one product ID? No, I'm going to be looking for. If they're in a list of multiples.

Which means over here. This needs to produce a list of numbers. Of the various product IDs.

So I would be just seeing the product IDs. And that I can put. Into here.

So those are the products. We got those 14 products. In that situation.

Now again. I know I did that through a subquery. Just so we can see how the subqueries work.

But like. Anytime you're starting to use those kind of primary key things. That's kind of a reminder of like.

Hey, wait a minute. I could probably just do this with a join, right? So. Let's say it through a join instead.

If I've got products. That I'm joining to line items. As we just saw.

We join these on product ID. That's the primary. It's a foreign key.

We join on. Product ID. In each table.

For me. Just for my own mental sanity, I just think. I'll start with the column in the first table.

And then I'll say. I'll do another column in the other table. Just so I remember each table.

The order of these two. They do not matter. But I just start with the first.

And then go to the second. Just for my own sanity. So I don't forget.

When they join. We've got all the product info. We've got all the line item info.

And so we can still do the status. And the quantity. Same filtering.

But now we've got all of the data. So. Again, when you can do it.

Through subquery or join. I would usually do things for joins. Rather than subqueries.

I think subqueries make things kind of needlessly complex. And also they don't perform quite as well. Because you're running a query.

And then running another query. Whereas this just runs all at once. So the join should work faster.

And I think is actually. Easier to understand. And you actually have more data.

Not that all subqueries can be joins. But when they can. Choose the join.

Instead. And. This would be a good time.

To break through our own. So come back in at two o'clock. And.

Continue on with our next subject. Can you just leave it up? Leave it up? Yeah, sure I can leave it up. Subquery.

Else. I'll just leave it up. There you go.

So have a good lunch. I'll see you back at two o'clock. These two queries.

One of them gives a 14 row answer. And one of them gives a 15. Now.

This. Results only looks at products. This.

Gives me products. And line items. But if I just want to compare the products part.

And I don't want all the line item. Information. If I just look at products.

So I include all of the columns from the products table. Only. When we look in here.

Let's say I were to. Order by. The product ID.

Notice how this one is listed twice. I've got a duplicate. If I add distinct.

I get to 14. So technically they're both the same. The only difference is.

The duplicate. If you don't limit it to just the product. If you allow the list item part to come in.

It won't work. To eliminate that. Because technically they are different list items.

So while the product information. Is the same. It's not.

If you scroll to the right. The list items because they were purchased. On two different list items.

Eliminating duplicates won't work. Because the list items are different. If you're only interested.

In the product information. We only have to show. The product information.

We could say P dot. To show all of the columns in. The products table.

And since we're only looking at products. Then we could have duplicate products. So that's the reason for the duplicate.

In that case it was just a duplicate. Yep. 33 was listed twice.

Yep. If you. Right.

Different line items. Exactly. If you just limit it to just showing.

The product columns. And then add distinct. Then that duplicate will go away.

You can't just add distinct. You have to put on just the product. So that you get rid of the line item information.

Because they are different line items. Those are not duplicates. But the product part.

Those are duplicates. If you add this to your query. Then.

Then you'll get the same exact answer.

photo of Dan Rodney

Dan Rodney

Dan Rodney has been a designer and web developer for over 20 years. He creates coursework for Noble Desktop and teaches classes. In his spare time Dan also writes scripts for InDesign (Make Book JacketProper Fraction Pro, and more). Dan teaches just about anything web, video, or print related: HTML, CSS, JavaScript, Figma, Adobe XD, After Effects, Premiere Pro, Photoshop, Illustrator, InDesign, and more.

More articles by Dan Rodney

How to Learn SQL

Master SQL with Hands-on Training. SQL is One of the Most In-demand Programming Languages and is Used Across a Variety of Professions.

Yelp Facebook LinkedIn YouTube Twitter Instagram