CASE Exercise in SQL

Group and analyze data using SQL CASE statements and JOIN operations.

Discover how to efficiently categorize data using SQL CASE statements and joins to analyze regional orders and user behavior. Learn techniques to streamline your queries and enhance your data insights with practical SQL tips.

Key Insights

  • Learn to use SQL CASE statements for grouping data by regions (such as West Coast and East Coast) and by user attributes (such as account creation dates) to simplify analysis and reporting.
  • Gain insights into effective data aggregation by combining CASE statements with aggregate functions like COUNT(), enabling queries like counting total orders per region or user group.
  • Understand SQL joins clearly by merging data from multiple tables—for example, connecting users with orders based on user IDs—to enrich datasets and facilitate deeper analytical insights.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

We want to create categories for, let's say, West Coast and Other. Of course, you can create whatever number of coasts or regions you want. That's up to you.

We're not going to make you type out a whole lot here just for the sake of this example. So this is Orders, right? We're working in Orders. And we've got the states.

If I want to just watch what I'm doing and I want to put a state there, just kind of compare next to this CASE, I want to make sure they go into the right case. In the CASE that ship state is, and you could say it's equal to something, but I'd rather use IN so I can just have one line for a set of values.

So I'm going to use WHEN the ship state is IN. And then I can say it's California. And I'm just going to copy all these here, up here.

Copy them all. I don't need the OR. I'm going to select this.

SQL Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

Check this out. I don't know if you know this about a lot of code editors. When you highlight something—let's say you just highlight these—

You can hit quotes and it wraps it in quotes. You can select this and hit quote, and it wraps it in quotes. So you don't have to type it and then type it again.

You can simply select something, and when you hit quote, it quotes it for you. A good code editor will do that.

A bad editor won’t. So DBeaver does it, but Microsoft SQL Server does not. It's not smart enough.

But it depends on your code editor. All right. If I just do that—

Oh, I didn't say what. So THEN it is—I'm going to say "West Coast."

Right. We've got Oregon. Let's find another one—

California. Right. We've got that.

And you can do as many different codes as you want. If it's not that, we're just going to say ELSE. You can go through and type in lots of different ones.

You want to do East Coast now? Like, I want to copy this. Paste it.

All for East Coast. Start putting in some states. Whatever.

Yep. Go down the line. Create as many different regions as you want.

Right. And I can name this—let this column be known as 'region'.

So I've got my different regions here. The purpose for it is—

Maybe just for labeling. Maybe you just want to label yourself, but in this case, it's because I want to group them. So I want to GROUP BY my regions.

And so I can only show people the region. I can't show people other columns—except I do need to add a column for some sort of aggregate function.

I want to know how many orders were shipped. And since I’m in the orders table, how many orders were shipped?

So how do I do that as an aggregate function? How do I know the number of orders? I'm in the orders table.

We have things like SUM, MIN, MAX, AVERAGE—COUNT. Because if I COUNT rows, I’m counting orders. So I want to COUNT.

Now I know how many are broken down by each region. If you just did two regions, you should get 27 and 473.

Also, just to show you—that's the nice thing about using commenting. You can comment out a line to say, "Oh, I want to keep it, but I’m not quite ready to delete it yet." And then when you bring it back, it can change your code.

So any line that is commented out is just ignored when the code runs. So you can add little notes for yourself, or you could comment out a whole line temporarily.

We can make a column to divide people into three groups based on when they created their accounts. So this would be users—based on when they created their accounts.

So, I’ve got my users. There’s a created_at. So I want to create a new column for—

The created_at. So I’m going to create a CASE. And in the CASE that—

The year is 2019. So I want to extract the year from this.

And if it’s 2019 or earlier—now, if I’m just saying it’s equal to 2019 or equal to 2020,

I could do a simplified CASE. But if I want to say it’s 2019 or earlier—like maybe 2018,2017—if I want to use that kind of less-than-or-equal-to condition,

That’s not a simple equality. That’s a condition. So, in that case, I need to use a searched CASE where I move my condition down into the WHEN clause.

And I need just the part of the date that I want. So I’ll use DATE_PART because I don’t want the whole date—just the year portion.

From the created_at column. And when that is less than or equal to 2019—

THEN I want it to say “Early.” So I got many. The rest are NULL because—

If I want to see those, I can also add the created_at up here just to compare. So that’s nice for spot-checking.

Sniff test. Everything looks good. Everything seems fine.

These are all 2019. And as soon as we get to 2020—okay—

This one, I could copy and paste. This could be 2020. But now it’s equal to 2020.

Those are my middle group. And then for everybody else, that’s late.

Now I’ve got Early, Middle, and then the catch-all at the end is just—

For everybody else. That makes sense. So I’ve now got—

A type of user.

Right. So I’m going to call that 'user_type'.

The whole idea here is I might want to know how many orders each group has made. Like, are my early adopters ordering more? Or my late adopters?

So the idea is that I could GROUP BY that user type.

But I can only then show people the user type. So I can’t show the created_at anymore. And then—

Just like we did before, we’re COUNTING the number of—oh wait—I want to know the number of orders.

But we just figured out when users were creating their accounts. So—

This is where you have to pay close attention to your question. In this query, what are these numbers telling me? What am I COUNTING?

Okay—

We’re COUNTING users. Remember, I have 100 users. And that adds up to 100.

So it’s pretty evenly spread between Early, Middle, and Late. You know, a little more toward the Late, right? That’s the number of users in each group.

But I want to know the number of orders made by each of those groups. So I needed to create those groups because I needed to divide users

Based on when they created their accounts. But now I want to count orders, which is in another table.

Now—

Is there a way that I can deal with both user data and order data in the same query? JOIN. I can join them together, right? Because when I want data from two tables, I can put them together into one big table.

I call it the mega team. It's not an industry term, right? So I can take little tables and build them up into bigger tables. And if I look at my

Entity relationship diagram—is there a way that I can join or connect users to their orders? Sure. I know—

I know which user placed an order. I can join them based on a user ID. So right now, when you only have users,

Every row is a user. If I then say, well, that's going to be users—and I'm going to join the orders table

On two columns that contain similar data types—user_id and user_id.

So in my users table, it's called user_id. In my orders table, it's also called user_id.

Same column name, just in two different tables. When I try to run it, it says: ambiguous.

It wasn’t ambiguous just a moment ago. What does the term “ambiguous” mean in this situation? We've seen this error before. Yeah, the same name is used in two tables.

So you notice it says created_at. Created_at is in both tables. Because we know when the user created their account,

And we know when the order was created. So the question is, which created_at do I want? I want the user created_at because I'm dividing users into groups.

I don't care when the order was created. I care about when the user created their account. This was the user type, right? I was using the users table.

So I just want to go back and add "users." Because it was confused. It's like, hey, do you want when orders were created? Or do you want when users were created? And now, if you remember, we had 500 orders.

Sniff test says this seems right. It adds up to 500. And now we've still divided people based on when they signed up—

But I'm counting orders. Because when you join users to orders,

You still have all the orders, but now you know who placed that order. Because that's what joins do.

Joins are not just about adding columns of user data and order data—

You're getting all the rows from those tables.

And when you join users to orders with just a regular join, you're looking for matches. I don't care about users who never placed an order.

I only care about users who did place an order. And it will match up for every single order—it’s going to say who placed that order.

So I will know who placed the order and what the order was, right? I mean, not what was in the order because we didn’t include line items. I don’t need that. But I’ll get a row for every single order.

I know who placed the order. Now we’re counting a much bigger dataset.

Because when you do joins, it’s not just about how many columns you have—

It also affects how many rows you have.

Because if you're getting the entire orders table with all those 500 orders—

And you start with 100 users—

Well, those 100 users are spread across all those 500 orders. They’re just repeated. But you’re going to get all 500 orders.

When you’re counting, you’re now counting orders. If you wanted to see the join, we probably should have done the join first. Because remember the order of execution—

If you want to see your data as long as possible to be able to envision what it’s doing—

When you write your code, if you can plan ahead, do it in the order of execution so you can view the transformations step by step.

Because if we did the FROM and the JOIN, you would have been able to see the join. Then we could have done the GROUP BY after that. But we would have seen the join happen.

Now, you can always go back and add the join just like I did. You just don’t see—the join itself wasn’t visible, but you saw that it clearly made an effect.

But you didn’t see what was going on because the GROUP BY hides all that intermediate detail. If it all seems kind of mysterious,

And you’re like, “Oh, I really wish I could have seen that join, ” do the join first. Then come back in and do the GROUP BY.

It might sound like that’s such an obvious thing. But I can’t tell you—when I was first learning SQL, I struggled with how to see what was happening. And I finally realized, hey, if I just do things in the order of execution—

The whole point of this is it’s a funnel—each step filters the data progressively. And once I realized that if I just follow the order, I’ll be able to see things as long as possible. That’s not an obvious thing that everybody figures out.

I was happy once I finally figured that out. Because now I could understand what was going on behind the scenes—step by step by step.

But that wasn’t immediately obvious.

photo of Dan Rodney

Dan Rodney

Dan Rodney has been a designer and web developer for over 20 years. He creates coursework for Noble Desktop and teaches classes. In his spare time Dan also writes scripts for InDesign (Make Book JacketProper Fraction Pro, and more). Dan teaches just about anything web, video, or print related: HTML, CSS, JavaScript, Figma, Adobe XD, After Effects, Premiere Pro, Photoshop, Illustrator, InDesign, and more.

More articles by Dan Rodney

How to Learn SQL

Master SQL with Hands-on Training. SQL is One of the Most In-demand Programming Languages and is Used Across a Variety of Professions.

Yelp Facebook LinkedIn YouTube Twitter Instagram