Inner vs. Outer Joins in SQL - Free Video Tutorial

Discover how SQL joins seamlessly combine data from multiple tables, enabling deeper database insights and analytics. Learn the pivotal difference between inner joins and outer joins and when to effectively use each.

Key Insights

Understand that joins in SQL allow retrieval of data from multiple tables by linking rows through common columns, such as user IDs connecting users to their orders.
Distinguish between inner joins (which return only matching rows across tables) and outer joins (left, right, or full joins, which include non-matching rows) to identify scenarios like employees without assigned departments.
Recognize that left joins include all rows from the left table (placing nulls where no matching data exists), while a full join retrieves all records from both tables, revealing every outlier or missing relationship.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

All right, so for those of you who were here yesterday, at the end of the day we learned about joints and one kind of joint called an inner joint. This morning I want to review what an inner joint is and contrast that with outer joints. So there are different types of joints, and when we think about an inner joint, it is one kind of joint. It is the most common kind of joint. To back up even more, the reason for using joins is that if you were just doing a query with one table, you don't need joins.

But when you want to deal with data that's in multiple tables, and you want to put them into one query, you must join them into essentially one query. So joins are how we get data from multiple tables. When you're looking at your database, go ahead and launch DBeaver so I can browse the database.

When I open my server and take a look at the databases there, I go into our company data database schema, which is a grouping or relationship of tables. In this case, we only have one schema, but you might have multiple schemas. It's a way to group tables together. I'm just going to go into our public schema and see the tables there. If you have multiple schemas in the database you're connecting to, you'd go into the various groups and see the tables in those groups.

Here we have a users table, and the tables are connected through relationships. Users place orders, and those orders are made up of line items, and the line items refer to various products. One of the ways you can see those connections is by double-clicking on tables and looking at the ER diagram, which stands for entity relationship, so you can see the relationships that are here.

And I can zoom in using the plus down here to zoom in and show you those relationships. So I can see that users place orders, and we connect those on some common column. So there has to be a little bit of duplicate data to connect two different tables.

SQL Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes, 1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

There should be some thread that ties them together. So in this case, it's the user ID. Here in the users table, it defines what a user is by creating a user ID, and then assigning information to it.

In the orders, it references that user ID that was created in the users table and says that a certain order was for a certain user based on their user ID. Then line items, which are the specific products that go together with an order, those are associated with an order by the order ID. In the orders table, it creates an order and defines the order ID.

However, the line items can potentially repeat that same order ID because you might purchase three different items on a single order. We would expect that this order ID could be repeated multiple times if there are multiple items on a certain order. Just like user ID might be repeated in the orders table if a user does repeat orders.

Now in the original definition, those user IDs are not repeated. So in the users table, you're only going to find one user number one. And that's never going to, you're never going to see another person with the same user ID because that would be confusing to have two people with the same ID.

That's how you uniquely identify people. And then finally, products. When you purchase a line item, it refers to a product by its ID, and there's another table over here that defines that product with its identification number, and it says things like the title, the original price, tags for finding products, when those products were created, and so forth.

So there's threads that tie these things together. So there's always a column in each that has the same kind of data, and you're connecting those two tables on that like data. So you're trying to find what is the like data.

In many cases, this will be defined by primary and foreign keys, but that is not a requirement for joins. Joins simply require two sets of data that you're trying to connect through some common information that's in both places. You're saying, hey, use this commonality to say, oh, user number one is user number one, user number two, user number two.

So you know that that order is for that person over there. You know that this line item is for that order over there. When we're doing that join, and we're combining information from multiple tables, because you might want to know the name and email of someone who purchased a specific product, but your product information is over here in line items, and your name information is over here.

We have to join these tables together. And in this case, we have to join them from users. We have to join into orders so that we know the order that the person placed.

And then we can figure out the product that they purchased. We have to, in this case, join three different tables. Sometimes you can join two, three, four.

You can join any number of tables, as long as there is a common thread. And you always just join one table to another, but then you can join another, and then you can join another, and you just keep doing join after join after join. Any join, any individual join is just simply joining two tables together on some common thread.

Then you can join another thing on its common thread, and then you can join the next thing on its common thread. So you just do join after join after join. The question is, when you're doing those joins, because you want all of that data, when you do an inner join, which is just simply called a plain join in SQL, that is going to give you matches.

Matches are a person who placed an order. But what about the non-matches? What about people who didn't place an order? That might be actually interesting to see who didn't place an order. So those are the outliers.

That's why we call them the outer joins. When you think about the inner joins, it's the matches, the people who did place an order. If I look at other tables here, for example, employees who work in departments, some of those employees work in a department, some don't.

If I look at the employees table and I look at its data, I can see that there's a few people who don't work in departments. They're the managers. They're not assigned a department.

If I do a regular join, I would be joining people who do work in a department. They have a department ID. But what about the people who don't work in departments, those outliers? That is why we do outer joins, to find those non-matches.

We have matches, people who do place an order, and the non-matches, people who didn't place an order. We have matches, people who do work in the department. We have non-matches, people who don't work in departments.

So those outer joins are for those outliers, those non-matches. So for outer joins, we actually have three different types of joins. And we can just leave off the word outer and inner.

So for an inner join, we just call it join. For an outer join, we typically just say left join, right join, and full join. So I never type in the words inner and outer, personally.

I just say join, left join, right join, or full join. But left, right, and full joins are all outer joins. So let's take a look at the difference here.

If we were to do a join on the employees and the department tables, I added Ds and Es just for the sake of understanding our data. In real life, these would just be numbers. They wouldn't have Ds and Es, but I just want you to understand that Ds represent departments and Es represent employees.

We're just going to be focusing on Ds for departments because I'm connecting employees to the departments that they're working in. So, for example, Amy Schumann works in department one. Department one is sales.

That gets joined together into one row. We get all the columns from this table. That gets put down here.

And then all the columns from this table, that puts down here. But we're connecting this row to this row. Now then when we get down to Elma, Elma also works in department one, which connects into this row.

So this row is going to be connected to two. We're going to see sales once and we're going to see sales again. We're trying to match a row in one table to a row in another table and saying that these are matches.

We know that that person works in that department. But what about the nulls? What about somebody who doesn't work in the department? Null means empty. But instead of just leaving it be empty, they write out the word null.

We know that that is actually empty. Null is not a string. It's not a piece of text.

It is just an indicator. It's a label that says that is an empty cell. There's nothing in it.

It is null. So for example, if a person's last name were null, it's not going to trick the system because null as a name would be a string and that is a different thing than being empty. That's full, right? So that wouldn't work.

This is just emptiness. So in the case of Chad Smith, normally in a regular join, Chad Smith would be left off because there's no match for no department. Bruce will be left off because there's no match for no department.

So in a regular join, we only get the matches and we ignore the non-matches. Those rows for Chad and Bruce would simply be left off of an inner join, the regular join. We only get the matches.

But with a left join, we're going to include all the matches and the non-matches. We're going to get them all. Now what can you put on the right-hand side for a department if there is no department? Nothing at all.

You simply fill in the blanks. So notice how more nulls are created over here on the right because nothing matches with nothing. So it's not going to make something up.

It's not going to make up a department to put that person in. It's just going to say they simply don't work in any department. They have no department ID, therefore they have no department name and it fills in nulls, but they are included.

Now the idea of the left join, what's the difference between the left, the right, and the full? The idea here is that when we write our code, we write from left to right as we type in our code. If this were all on one line, the employees and then the left join for departments, imagine that being on one line. The employees would be on the left of the code and the departments would be on the right of the code.

So there's two tables. One is on the left of our code and one is on the right of our code, literally because on one line. When we say a left join, we're saying we care about all of the rows in the left table, meaning give me all the rows in the left table, which in this case is employees because I put it on the left.

Give me all of those rows regardless of whether they're a match or not. Give me every single row. So you will get every employee, no matter what.

The only question is, what do they put on the right-hand side? If there's an actual connection, you'll get actual data. But if there is no connection, if there's nothing found, then it will just put in nulls. It'll fill in the blank.

A regular join would not do that. A regular join would leave off the nulls. A regular join would only give you matches and it would not also include the non-matches.

Said another way, all joins always give you matches. The question is, what happens with the non-matches, those outliers? That's where outer joins also include the outliers. That's why they call it outer joins.

All joins give you the matches, like an inner join. But the outer joins also give you the non-matches. But the question is, which table do you want all of the rows from? So for example, I want every employee.

And if I put employees on the left of my code, imagine this all being on one line, employees would be on the left. So I would get every row from the employees table. But I might be missing data from the right-hand side because I only said I want it from the left table.

I didn't say I also want the outliers from the right table. Notice that there's no department four or department five. Even though it's in my department's table.

Because I have no employees who work in support and administration. No wonder this company is going to go into the crapshoot. Nobody works in support and administration.

But notice they're not left. They're not put into my table here because I only said I want to do a left join. Give me all the rows from the left table only.

So even if there's outliers in the right table, the one that I put on the right of my code, it doesn't worry about those because I didn't say to worry about those. I didn't say to include those. If you look at these little Venn diagrams here, I know when people teach joins, they always show these Venn diagrams.

And until I understood what was actually happening, these Venn diagrams didn't really make a whole lot of sense to me. When I just looked at them innately, I was like, I don't know what this means. But when you look at it, the idea is that if you have two tables, there's some commonality between them.

And then there's some parts that don't have a common thread. So the inner part here, this is why we call it inner joins. The inner parts are the connected things, the matches.

The idea of the coloring is what parts are you including? Which rows are you including? Because joins are all about matching rows in different tables. I also, I kind of misunderstood when I was starting to get into joins. I thought it was all about columns.

And yes, you're getting all the columns from both tables. But it's really about matching rows across tables. Like this row for this employee matches this row for this department.

You're matching rows. And you're saying, do those rows connect to other rows? Does this user match this order? Does this employee match this department? And you're connecting rows. So what number of rows are you getting in your result set? We'll get more rows with outer joins because we're not just getting the inner parts, which are the matches.

We're also getting the outer parts, which are the non-matches. But notice in the table B, the one that's on the right, notice that that's not included. So the shaded parts are the included parts.

It's what we get. We get the matches, but we also get the non-matches, but only from the left table. If we also wanted them from the right table, we would need a full join, which would give the left and the right.

They could have called a full join. They could have called it a left and right join. Because we want everything from both the left and right tables, but they called it a full join.

We don't have to say left and right join. It's a lot less code, but do you want the fullness of your data? Do you want everything from both tables? Not leaving out any single row. That is a full join.

You'll get the fullness of your data. Everything that's there. So that kind of join, when you say full join, that'll give you every row from the left table and every row from the right table.

All joins give you the matches. The question is, what about those outliers? Those outer parts, the non-matches. A left join only gives you the non-matches from the left table.

And you define the left table by putting it on the left side of your code. When you say full join, the order here doesn't really matter because you're getting both the left and the right. So it'll do a left join essentially.

And then it'll say, did I miss anything from the right table? And in fact we did. Now, what's it going to put over here for employee data if nobody works in administration? Nothing. It has to just fill in nulls.

So depending on what kind of join you're doing, you're going to get these nulls when you're doing these outer joins where the join itself fills in nulls. We say, well, nobody works in those. We can't list any employees because nobody works there.

Now, I know, for example, that an employee ID, every employee must have an employee ID because if it's a primary key, every employee must have one. If there were a null here in the employee ID, I know the database itself could not have created that. I have to know that the join could have created that, but the database couldn't.

So part of the reason to do these outer joins is to find those outliers. If you want to find departments that have no employee, I could look for where this is null and I know that those departments have to be empty departments because I know that this could never have happened in the database. Same thing with department ID.

If there's a null in department ID, that can't be because if somebody was working in a department, like these are people who don't have departments. Of course, over here, if a person doesn't have a department, they don't have to be assigned a department. Anytime you find a null, you know that that person doesn't work in a department or there's a department that doesn't have an employee.

So in other words, searching for nulls here will help us to find those outliers, people who didn't have something, like people who didn't place an order. It's those did nots that we're looking for when we're doing those outer joins. Now, notice I didn't talk about the right join.

There's a saying in SQL, friends don't let friends do right joins. You choose the order that you want in your code and you choose which one you want to put on the right. If you just switch them around and put it on the left, you could do a left join because if you think about starting with a table, you probably start with the table you care about, the one that you want to include all the rows for.

So since you can choose to put it on the left, and we can then do a left join because you want all of the rows from that left table that you chose to put on the left. If you rearrange those two and you put it on the right, yes, you could include all of the rows from the right table. But why would you end with the thing that you care about? Why wouldn't you start with the thing that you care about? So since you can control the order of what you code, you could just put it on the left and just do a left join.

You don't need to do a right join. So sometimes people are like, oh, I did a right join. It's like, yeah, or you could just rearrange it like a normal person and do a left join.

I will say 99.99% of the time, you could just simply reorder your code and you could do a left join. So in the end, you really fundamentally have three joins, a regular inner join, which gives you matches. And then two outer joins, left and full.

Left says, give me all the matches, but also the non-matches in the first table, which I put on the left of my code. The full says, give me all the rows from both tables. I want everything.

Don't leave anything out whatsoever.

Key Insights

Dan Rodney

How to Learn SQL