Understand how Tableau's Data Interpreter can streamline your workflow by automatically cleaning up common spreadsheet issues. Learn how to review its changes and prepare your data for more accurate visualizations.
Key Insights
- Data Interpreter is an automated feature in Tableau that identifies and removes non-analytical elements such as titles, footers, and empty cells, helping users isolate clean data fields without manual intervention.
- When enabled by checking a box, Data Interpreter generates a separate Excel file that highlights changes using color codes: peach for headers, green for usable data, and red borders for excluded content.
- Noble Desktop explains that Data Interpreter is not a full data preparation tool and is best used for resolving simple spreadsheet formatting issues before performing more advanced data cleaning outside of Tableau.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
So, the Data Interpreter is a feature that Tableau sort of like gives to users to help fix some very simple data issues. And so, it's not a full data prep tool, but it's something that can handle some very simple issues with your spreadsheets that you're bringing into Tableau. So, let me go into the next slide.
Data Interpreter is an automated data cleaner included with Tableau. It detects common problems with your data source when you connect to it, when you connect to your data source. It can detect things like titles, notes, footers, empty cells, and so on, and bypass them to identify the actual fields and values in your dataset.
It can even detect additional tables and sub tables that you can work with, that you can work with a subset of your data independently of the other data. So, this basically means if you have multiple tables on one sheet, it can detect that they're separate tables. After Data Interpreter has done its magic, you can check its work and make sure it captured the data that you wanted and that it identified it correctly.
Then, you can make any necessary adjustments. The link below will provide you further details about it. The screenshot on the right is an example of the Data Interpreter at work.
We're going to do this. You're going to see this in Tableau. Very shortly.
In general, before you use Tableau, your data must be cleaned and prep prepared. Tableau automatically assumes that's the case. Tableau Desktop and Tableau Public have no features for adjusting data sources.
That must be performed outside of the application. Data Interpreter is only intended to fix common errors that occur when spreadsheets are prepared and made human-readable. What we mean by human-readable is it's easy for someone who's a human being to look at the data and understand what it's trying to say.
That is not the same for Tableau. Tableau likes information structured in the way the data would be structured to work with an SQL Server. Just columns and rows, no additional information, no context, just the raw data in an organized format.
Data Interpreter cannot be manually enabled. You have no control over this. There is not a menu option that you can click on to turn on Data Interpreter.
If Tableau detects that cleanup is needed, it will give you the option to use this feature. The only choice that you have is to click a checkbox or don't click a checkbox. That's it.
So, let's actually use Data Interpreter. Let me just see if this is where my instructions for this. Okay, yeah.
So, the first step is to open the data that we're going to work with and just take a look at it. We'll see what the issues are before we bring it into Tableau. I'm going to go to the folder, Tableau Level 2, and then I'm going to head over to Datasets.
When you go to Datasets, you're going to see multiple folders. Now, the folder that we have been using is the Corporate Superstore Sales Data, but this time we're going to choose a folder that has a file that has issues with it, and that is World Bank Datasets. Now, there's a couple of files in here.
The one we're looking for is WorldBankLifeExpectancy.xls. I'm going to double-click and open that. We want to review the data before we bring it into Tableau. I'm going to double-click on it.
It's going to open up in Excel. It may not open up on the first tab, so I'm going to click Data, and when I go to Data, this is the main information we're interested in, all this information on the Data tab. This is actually the information that we used when we were working with the visualization for life expectancy for all the different countries.
This is the very same data. In fact, we'll get an opportunity to bring this in and recreate the same visualization, and you'll see it's not really that hard. It's actually pretty simple.
This information is human-readable. What do we mean by human-readable? Well, the data source is identified as being from World Development Indicators. The last updated date, that's useful.
I'd like to read that. 2021. Okay, maybe we should update that.
These are the list of countries. These are the country codes, the indicator name, indicator code, and the most important information, the life expectancy for each of the years from 1960 to 2020. Well, we don't have anything for 2020, so it's null.
All this is empty, and so this is very easy to understand. All the countries are listed once, and if I look across, I can see the life expectancy from 1960 up to 2020. This is not Tableau-friendly, so this is something we're going to need to fix.
What Tableau will handle is this top part here. This is not similar to the titles, the headers, and data that we have here. This is actually unnecessary.
Tableau doesn't need this. All right, so I'm just going to close the file, and I'm going to head back to… So, we reviewed the Excel spreadsheet for obvious issues and common errors, header title, multiple tables in a single document. Let's bring this information into Tableau.
So, I'm going to go to Tableau. I'm going to resize this. I'm going to do what I like to do, which is drag and drop the information and just bring it in.
I'll go to World Bank datasets. You can do that on your end as well. Hopefully, you're following along.
Again, if you have any questions, let me know. I'm going to take World Bank life expectancy. I'm going to drag it right over here and wait a little while, and there it is.
I'll minimize this, and I'll open this up so pictures are being taken of me. It would be great if the audio was taken as well. Well, the audio is being taken in this video, so maybe they compare the audio and the video.
It would look weird. My mouth wouldn't be moving, but people could get a sense of my voice, which is probably my best feature. All right, so I brought the information in.
This is all pretty familiar to you. There's something new that I see here, and you should see in Tableau as well. There's this thing here that says use data interpreter.
Data interpreter might be able to clean your Microsoft Excel workbook. Now, if I want to preview my information before I run the data interpreter, I'm going to go here and click this button. It's the same button that we clicked on to take a look at the information before we connected the tables.
Here's the issue. Data source last updated. Null, null, null.
Null is bad, especially in SQL. There's the actual information. It's like there's a gap between, yep, the title and the information here.
This needs to be cleaned up. I'm going to close this. Let's take a look at this.
This also might need some work, but we're not really interested in that information, but it's there. We could potentially use it, but I want to actually use data interpreter. That's what the goal of this is.
Let me go back here. We did the preview button thing. We connected to the World Bank life expectancy data.
We reviewed the file in Tableau. Notice the header is missing the top three rows. We did that.
Then now we're going to turn on data interpreter. How do you turn on data interpreter? So simple. Just click a check box.
Okay. Then what? It's done. What? It's done.
It just cleaned your data. Really? Yeah. See the new message here? It says data interpreter removed some data.
Well, can you show me what you removed? Data interpreter says yes. If you click this blue link that says review results, I will show you the changes I made. I'll click here.
It's going to open up Excel and give me a little bit of a progress report. There's a little color key here that describes what happened to the information based on the colors that are applied to the data. So anything that's this pinkish color or peach color is data that's interpreted as column headers.
Green is data that is interpreted as values for your data source. Red, a red border, is data that has been excluded from your data source. Okay.
So show me. I could read more, but you can read this yourself. I'm going to click the data tab.
There we go. These are the headers. This is the data.
Anything that's not colored with this peach color and the green data color is not included. Not included. We excluded the first three rows.
I can take a look at the other data. Okay. So yeah.
This looks like it's going to be the final result of this. And then this is the final result for that. And this is the final result for that.
So this file just gets produced for you to review it. There's nothing else I need to do with this. I'll just close it.
Okay. So let me check the data. I'll click this button here.
Hey, look at that. That is much cleaner. It actually did the work.
And now I just see the country code. I see country name, indicator code. The headers are now displaying as headers.
I'm going to drag this in because now it's cleaned. I can drag it in and look at my data even closer. These are the fields that are available in the table and there are the columns.
We turn on the data interpreter by clicking the checkbox. When the work is done, you can click on the link to review the results. We did that.
Pink is data interpreted as headers. Pink or peach. Green is data that is read as values.
And red is data that has been excluded. And it excluded additional information as well. So Tableau can fix these issues.
It looks for column headers in the first row. Tableau is looking for well-structured data information. We previewed the data again.
We added the data sheet to the table canvas area.