Master SQL string manipulations to improve data consistency, clean up databases, and extract information more effectively. Learn powerful functions like UPPER, LOWER, SUBSTRING, and SPLIT_PART to handle common text-based data challenges seamlessly.
Key Insights
- Understand how case conversion functions (UPPER, LOWER) ensure data consistency, such as standardizing email addresses or state abbreviations.
- Implement SUBSTRING function to flexibly extract specific portions of a string, useful for managing complex zip codes or retrieving segments from standard formats.
- Apply SPLIT_PART function to simplify parsing strings based on delimiters, such as easily separating zip codes into main and extended parts without counting characters manually.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
This is a lesson preview only. For the full lesson, purchase the course here.
Let's talk about string functions. So string functions are a way to manipulate strings or texts. We've seen aggregation functions.
We've seen date functions. And these string functions are just a way to manipulate strings. We did see one string function yesterday.
We saw left which gives us from the left a certain number of characters. So let's see some other examples. So anytime you want to change strings that are currently in your database for your query here, for example, let's say tags.
Here I've got all these tags. I could make them uppercase tags and I'll convert them all to uppercase. Or you could do a lower for lowercase, right? That could be a way that if you have weird inconsistencies with upper and lowercase you know, I know sometimes you can do things like I like to be insensitive to case but you can also change the case and then, you know, like for example emails.
It might just want to be like, hey, let's lowercase all of that. And then we always know it's always in lowercase. So case is not a problem.
Same thing with states. What if people can type in states and they type in lowercase and y? Get uppercase. Zip codes.
Yesterday we saw with zip codes we could get the left part of it But substring is actually a pretty cool thing that you can do Uh, so let's say from my orders here In my orders, I have a zip code And I'm going to list the zip code twice The second time over here I'm going to truncate it to just the first five characters There's a couple ways you can do it Yesterday I said you could do the left Five characters Starts from the left And goes five characters That would be one way you could do it But another way which is a little more flexible in some ways You could say give me a portion of a string a substring and you tell it two numbers One is the first number The first character that you're starting with So think about if you count your characters You got character number one character number two character number three. God bless you don't think about what the character is just slot number one slot number two slot number three and so Starting with the first slot. Whatever the first character is give me five total characters This would give you the first five characters Now I think if that's all you want, I think left is easier But the advantage of this is I could start with the second character So I skip the first and I start with the second Or if I keep moving over to the third character now I'm starting with the third character Eventually if I get up to the fifth character And then the sixth character And I get to the seventh the seventh is the dash That was the one after that.
Sorry Because you got five The dash is six Then if you get to seven If you go from seven and then you want how many characters I want four characters It's okay if that number is more than you need But starting with the seventh character if you want four characters that will give you the last four In the 10 digit zip codes, so that would be a way to get the last four of a zip code if there are Like this one doesn't have the last four. We don't want any We don't want anything for that So substring can give you a portion This is not the last character. This is how many total characters you're going to get So what's your starting position? And how many characters are you getting in total? And again, it's okay if you have too many It's not going to make up characters.
If you go too far, that's okay. That won't hurt you.
That gives you the last four of those The substring always starts from the left So it counts characters from the left So this is the seventh character from the left always starting from the left Now split_part is a really cool one um If you want to split the zip code into two parts So, uh, this is again in orders. So let's say I see that zip code again So there's my zip code. I'm going to do another column here If I do split_part to split it into parts I could split it on a dash character Now if there's a dash character, it'll have part one and part two I want to get the first part The part that comes before the dash Or part two the part that comes after the dash Oh that that's way easier than counting characters, isn't it? Now counting characters can be useful But to be able to split things into parts And then it counts the number of parts part one part two Starting from the left All right.
Do you want the first part? Do you want the second part splitting it on the dash you can split it on any part that you want? So that's pretty cool Um Now the left that we saw yesterday where we said starting from the left give me five characters If you just want the left five characters, that's the easiest way to do it simplest way Now when people start to think well if I want the left most five If I want the right most four, let's say I want the stuff after the dash Couldn't I just say the right four? See the problem here I'm always getting the right most four characters Even if there are only five characters I don't always want the right most five or the right most four I only want it when I have a big zip code. I don't want it when I want a little zip code That's where things like the substring or the split_part would be better now what would be perfect here is if this were if this were a social security number People are always saying give me the last four of your social right Or a phone number yeah, or more like a credit card the last four of a credit card Because they're always the same length for those right then that would be useful But in this because I have different lengths, it's not useful in this situation. But that would be totally great for certain cases. That's why I mentioned it.
This is more of a thought process; I would not use this code in production in any way. And to save time, I'm just going to copy and paste this here. Please do not use this. This is more. I just want you to think a certain way in this. I want you to think that CASE is not always as simple as I made it seem the first time. And if you think about CASE as more of an IF-ELSE, you can use functions to accomplish tasks. Length is a function that returns how many characters something has. When we think about the length of a string, we think about the number of characters it has.
If I create a column that says when the number of characters in the zip code is greater than five, then use the RIGHT function to give me the rightmost four characters. See how that's more like IF-ELSE logic because I'm using functions to perform tasks. I know the first examples we did with CASE were very basic, like testing a string and outputting a string. But you can also include string functions to make them more interesting. Now I could demonstrate this with a simpler example, which we did earlier.
There are better ways to do this. But since we just spent considerable time on this, I wanted you to start thinking differently about CASE. That it is kind of an IF-ELSE situation, and you can put string functions into there to do things. For example, if a condition is met, then perform an action; otherwise, perform something else. But this particular example is primarily for illustration. Please don't use it in production. The SPLIT_PART function would be better, or the LEFT function or, you know, one of these other ones would be better. But I just wanted to demonstrate that CASE can be more versatile than previously indicated.