When you are dealing with perfectly curated data like Tableau’s Sample Superstore, data preparation is easy. A few clicks and you are on your way. But as we all know, preparing real world data for reporting tools like Tableau is not that simple.
What is easy is bringing data into Tableau. Tableau Desktop alone can connect to over 70 different data sources. But making sure the data from all those sources is intuitive and Tableau-friendly is another story. It needs to make sense, be easy to understand and – most important – be accurate.
Turning data into a Tableau-friendly format typically requires transforming and preparing the data prior to use. Once properly formatted, building Tableau dashboards and enabling self-service can be a piece of cake!
There are many options for formatting data for use in Tableau, from built-in Tableau tools to third-party data prep platforms. In this blog we look at some of the available options and discuss the five most important aspects to consider when determining which one is best suited for your project.
Options for prepping data for use in Tableau
The recent 2018.1 launch of Tableau gives users more choices for making data Tableau friendly. Both Tableau Desktop and the newly released Tableau Prep offer tools for data preparation. These tools are fine for simple use cases but they may not suffice for other transformations and they may not support all the input and output options you need.
For complex data preparation, a third-party data prep platform may be necessary. There are hundreds of these extract, transform and load (ETL) tools available, each with different capabilities and strengths. It would be impossible to cover the features of every one, so we’ll limit the discussion here to the tools available with the Tableau Creator license and commonly used third-party platforms such as Alteryx, Informatica and Microsoft SSIS.
With so many options outside of what we cover here, it can be difficult to determine which is the best tool for the job at hand. If you need help combing through the options and matching them to your company’s needs, please get in touch with us. We can help you build the best foundation for your reporting.
5 factors to consider when picking a data prep tool
- Data source requirements for input and output
- The types of “clean and prepare” steps that will be necessary
- Ease of use and technical knowledge required to use the tool
- Current and future needs for advanced capabilities (i.e. predictive analytics, advanced mapping, statistical functions and data modeling, ETL job scheduling and data testing)
- Price and total cost of ownership
1. Data source requirements for input and output. Every ETL tool can connect to different types of files for input and output. These sources can be Excel or CSV files, databases such as Microsoft SQL Server or applications such as Salesforce.
Tableau Desktop lets you join different data sources and do simple transformations. You can save the transformed data as a Tableau data source, embed it in a packaged workbook or output it to a CSV file.
Tableau Prep works similarly to Desktop, but (at least with the first release) accepts fewer data sources than Desktop (26 vs. 70). Like Tableau Desktop, the cleaned and prepared data can be saved as Tableau data extracts or CSV files. If you are only generating data for Tableau, this limited functionality may be sufficient.
To get the ease of use and performance you need, this step ideally entails building a well thought out data model that is specifically designed for reporting. At Senturus, we have experts to help you build this type of architecture.
2. Types of “clean and prepare” steps that will be necessary. Tableau Desktop gives some good options for cleaning and preparing data. You can easily convert data to geospatial or Tableau date formats, join data, rename fields, hide unused columns and even pivot data in Excel and CSV files.
Tableau Prep offers even more options. See our blog about Tableau Prep features. Tableau Prep lets you pivot database columns, group and aggregate results and edit data. It also gives you a visual representation of data distribution and join results.
Third-party data prep platforms usually have more capabilities in the areas of predictive modeling, advanced statistical analysis and mapping. If you need this type of cleaning and preparing, these tools may be worth the additional investment.
This image below gives you an idea of a few of the options available in Alteryx:
3. Ease of use and technical knowledge required to use the tool. Tableau Desktop is easy to use for simple data prep. You do not need to be technical to use it and it comes with the Creator license.
Tableau Prep, which also comes with the Creator license, provides the next level of complexity. It lets you consolidate cleaning and preparing functions and build a multi-step, repeatable flow that can produce Tableau-friendly data sources. While Tableau Prep is fairly intuitive, it does require some technical knowledge. This is especially true when it comes to things like joining disparate data. To join data properly and get accurate results you need to know your data and what columns to join. In addition, you need to understand the granularity of your data files and how joins behave. If you are not familiar with these concepts, you might start with a 10,000 row file (like Sample Superstore), join it to a small table and then end up with a Tableau data source that has 100,000 rows or even more.
A misguided join can really mess up your numbers/amounts, producing inaccurate and duplicated data that gives you incorrect numbers when you build your Tableau worksheets. To avoid this messy scenario real expertise is required to competently prepare data that requires complex joins.
See below for a multi-step data flow in Tableau Prep that includes a join:
Alteryx, Informatica and Microsoft SSIS are more complex products than Tableau Prep and require more technical knowledge. While similar to Tableau Prep in regards to presentation and workflows, they offer more options for cleaning and preparing data. It takes time and (some) technical knowledge to learn them well and become proficient. Informatica PowerCenter allows functions like data testing, data archiving, complex data integration and metadata management. Alteryx has many complex algorithms such as: advanced spatial and predictive analysis, prescriptive tools, testing /validation and advanced statistical functions.
4. Current and future needs for additional capabilities such as predictive analytics, advanced mapping, statistical functions and data modeling, ETL job scheduling and data testing. Tableau Desktop and Tableau Prep both have some great features when it comes to geospatial (mapping) and analytics, data conversion and filtering. For example, in Tableau Desktop you can convert city, state, aip data to a geospatial data type that automatically generates latitude and longitude. Once defined as geospatial, you can display the data on a map by double clicking on it. But if you need to see a data diagram, do AB testing, predictive analytics or data modeling you may need another tool.
See below for data modeling in Informatica PowerCenter:
5. Price and total cost of ownership. Tableau Desktop and Tableau Prep both come with Tableau Creator license at no additional cost. If you are on a limited budget and need more capabilities, there are free open source ETL tools. But keep in mind that training people to use new tools can be time consuming and expensive.
Some databases, like Microsoft SQL Server, come with tools that can be used for preparing data. Microsoft SSIS is a popular development tool for Microsoft databases (i.e. SQL Server) and is often used by developers for ETL. SSIS used in conjunction with the Tableau tools may be all that is required if you are dealing with Microsoft databases.
But if you are dealing with other databases, such as Oracle or Amazon Redshift, you may need a third-party tool such as Alteryx and Informatica that can work with many databases. The additional functionality may justify the higher initial cost. This is especially true if you already have in-house expertise for a particular tool.
Another great feature about Alteryx and Informatica is their popularity. That means there are a lot of resources built around them (i.e. technical support, user forums, training, libraries with packaged flows, etc.). Using these resources can save valuable development time, reducing the overall cost of the project.
Bottom line
Preparing data for Tableau is easier said than done. Often, many steps are required to make data Tableau-friendly. While Tableau 2018.1 offers some great options for preparing data for Tableau, sometimes the tools will not meet your requirements. If this is the case, you’ll want to consider the ETL tools that come with your database or third-party data prep platforms such as Alteryx and Informatica. When determining which way to go, keep in mind these five considerations to help determine the right data preparation tool for your project. If you need help selecting the right tool, contact us.