Solution- Inner Join in SQL

Use SQL JOINs to retrieve name and email information for users who purchased individual line items worth $700 or more.

Discover how to effectively join multiple database tables to retrieve meaningful insights, like identifying high-value customers. Learn practical SQL techniques to streamline data analysis challenges.

Key Insights

  • Understand the importance of joining tables through a shared column, such as using "order ID" to bridge "line items" and "orders," when no direct table connection is available.
  • Apply conditional filtering by calculating the product of price and quantity to isolate purchases of individual line items valued at $700 or more.
  • Use the DISTINCT keyword strategically to avoid duplicate customer records, ensuring a clean, unique list of names and emails.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

So real-life scenario here. Somebody says, I want to know the name and email of people who purchased an individual line item worth $700 or more. So let's think about where the data is name and email.

Where is that? Users, people who purchased a line item worth $700 or more. Where do I have to go for that? Orders is the value of a line item. Is that in orders? Do I have any pricing information in orders? No, but I have it in line items.

So line items have the price paid and the quantity. Now, if I need stuff from users and I need stuff from line items, I need to go through the orders to get there. So like, if you're going from one place to another, I can't say, go from New York to California without crossing the country.

If I want my data to go from users to line items, I need to go through orders. This is the bridge that I need to go through because there's no way to directly connect users to line items. I need to know that this user placed this order, and this order was associated with this line item, so I had to go through orders to get there.

So sometimes you need to involve tables that you don't technically need, but you need to use it to get to the other data. So I want this and this. So I got to go through orders.

SQL Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

So let's say I want to know the line item stuff. So I'm going to start with that because I can get the price and the quantity. Let's say I start there.

If that's my first table and I get everything from there, okay, I know the price and the quantity. Okay. So I've got that, but I don't know the name and email of the person who bought that.

So I'm going to need to join in, but I can't just join in users because I have to go through orders to get there. So I know I need to join because I need data from another table. And I know I need to go to orders because that's how it connects in.

The question is, how can I connect line items with orders? What's in common? Order ID. Yep. That lights up green.

So in both of those tables, I have order_id and when they're equal, we'll have a match that we associate line items with a certain order. If I have the on, what are we joining on in the line items table here? It's called line_items, order_id. So I have order_id.

And what does that equal in the other table? Orders order_id once again. So orders order_id, when those things match, we have a match that that line item goes with that order_id. So I know that this line item for this particular product, price, and quantity matches up with that order.

See, this line item is for order number one, order number one is for this person.

Ooh, now I have a connection to my next table because I know that the user_id here connects into the user_id over here. And I need that data. Now I can say, let's join in the next one, which is the users table on which two columns are equal.

Well, we just saw in the orders table, it's called user_id. So in orders, it’s called user_id. And that's equal to in the other table, which is users also called user_id.

So in users, it’s called user_id. And now I have all the data. Now I know that for line item one, it was for order number one, this price, this quantity. For order number one, which was placed by user two, and user number two is Elma with this email. So in one row, I have all the data I need.

Now I've got all the data there, but the question is, I want the name and email, but only for people who purchased an individual line item worth $700 or more. So I only want to see rows where, because a lot of these rows are for every single line item, but a lot of these rows are for cheap line items. So how do I know which one is $700 or more? Price times quantity, right? So where price times quantity is greater than or equal to 700.

  1. So this is going to get me a lot fewer rows. I'm down to 11 rows, right? I don’t need to see the column.

I can still filter the rows because I only want to see rows that are these expensive line items. I could add a column if I want to, that's up to me, but I don't have to add a column.

I can still filter it by that. Out of all of these columns, I don't need to see all this data. I only want to see their name and their email.

So how do I just show two columns, name and email? Yep. I just put those two columns because I don't need to see the rest. And now I see just those name and emails.

Ooh, Mr. Reginald Pfeffer. He’s done it twice. All right.

I don’t want to see people twice though. How could I clean this up? Use DISTINCT. Now I have a list of unique names.

So even if somebody has done it twice, I’ll only see them listed once. So 10 people have ordered an individual line item worth $700 or more. So these are people spending a lot on a single individual line item.

So, okay. So yes, Reginald had it twice. So let's say I don’t want to get rid of the distinct and I see that Reginald was listed twice.

Why? Because he had two different orders or maybe one order that had two different line items that were of high value. If I want to see how much those were, I could choose to add a column that would be price times quantity, and I'll label it as value if I want to see those values, because I might want to be seeing how much they spent on an individual line item, like how much more than $700. I don’t have to do that necessarily unless I want to see that information, but I might want to add that.

And I can see here that Reginald, on two different occasions, did that. If I want to see, let’s say for an order number, now order number is in there multiple times. Order number, like the order_id is in both tables.

So maybe I pull that from the orders table. If I want to see, were they from the same order or not, I could say from the orders table, let’s grab the order_id. Because I’m kind of curious, were they from the same order or not? And no, they were from two different orders.

So on two different occasions, he purchased a line item worth more than $700, but it could have been on the same order. He could have one line item worth $700 and he could have had another line item worth $700 on the same individual order.

Do you need to say which table things come from? The only time you need to specify the table is when there is ambiguity, like with name, email, price, or quantity.

So order_id, notice if I don't put it there, it says it’s ambiguous. Because order_id does not exist. Just once, it’s once in the orders table and it’s once in the line items.

And so it doesn’t know which table to grab it from. I know they’re equal to each other. So technically it doesn’t really matter in an inner join, but we do need to choose one.

My thought is if it’s an order_id, it’s order-related. So I’d rather grab it from the orders table, but you only need to do that if a name is repeated in multiple tables. As long as the name is unique and never repeated, like it is for all the rest, you do not have to say the table it comes from.

Can you? Sure. Do you have to? No. So could I say that this comes from the user table? Yes.

Could I say that that comes from the user table? Sure. Do I have to? No, I don’t have to do that. Some people like to do it to make sure it’s clear to everybody else where the data comes from.

But the only time you are required to do it is only in the case that your columns are repeated with the same name across multiple tables. That’s the only time you have to do that. So that lets us join all the data into one place, filter it to just get what we want.

Also, just a reminder for all the files we’ve had here, I always include a solutions file. If you ever want to see the solutions to all those challenge questions, there’s always a solutions file, which just goes through all the answers to the challenges we have down below. So in the future, if you ever want to check back in and if you forget a solution and you're like, Hey, what’s the answer, you can always look at the solutions file for each one.

So the numbers here connect you from the questions to the solutions in case you ever want to reference that in the future.

photo of Dan Rodney

Dan Rodney

Dan Rodney has been a designer and web developer for over 20 years. He creates coursework for Noble Desktop and teaches classes. In his spare time Dan also writes scripts for InDesign (Make Book JacketProper Fraction Pro, and more). Dan teaches just about anything web, video, or print related: HTML, CSS, JavaScript, Figma, Adobe XD, After Effects, Premiere Pro, Photoshop, Illustrator, InDesign, and more.

More articles by Dan Rodney

How to Learn SQL

Master SQL with Hands-on Training. SQL is One of the Most In-demand Programming Languages and is Used Across a Variety of Professions.

Yelp Facebook LinkedIn YouTube Twitter Instagram