By Bill McLellan, Senior Consultant, BI Solution Architect
Recently, current and prospective clients have asked me for ideas about improving extract performance in Tableau Server. A couple of the first options to consider are hiding unused columns and using an incremental load instead of a full refresh. You may also want to scale out to a 2-node configuration with the additional node devoted to extracts.
But here is the most important thing you can ever do to improve Tableau extract performance:
Stop creating giant custom SQL or views in your data warehouse for Tableau to read. Coders gotta code. But stop! Really.
Use Tableau Desktop’s Data Source tab as much as possible. Flattening a star schema’s fact and dimension tables into a single physicalized view doesn’t offer the performance for extracts that it does for live connections. It usually leads to longer extract time in Tableau. Of course, you should use columnstore indexes in the database where they speed reads, but the compression they offer doesn’t matter at all. The data gets uncompressed anyway to be sent across your network, and then it gets recompressed in Tableau. You don’t need that kind of flat structure in your database.
Last week I attended Tableau Conference in New Orleans with a couple of colleagues from Onebridge. Tableau announced a new feature that reinforces this method of splitting up data sources into several pieces and joining them in Tableau rather than SQL. With version 2018.3, Tableau has added a feature that stores extracts as multiple tables rather than one large dataset. It’s still just one hyper file. Multiple-table hyper files greatly improve extract performance, and while they potentially compromise dashboard responsiveness Tableau has managed to keep viz queries very fast. Sometimes the smaller file size even improves dashboard query times. Extract performance gets the biggest boost when row level security blows up the number of rows in your data source.
Regardless of whether your hyper file stores a single large data source or keeps the tables separate to be joined at runtime, you should still keep your facts and dimensions separate. Hit your data warehouse or similar source directly with Tableau. Actually, Microsoft Power BI forces this multi-table approach by limiting the size of each table in a data model to 10 GB uncompressed. Tableau doesn’t enforce any limit per table, but the principle applies because the technology is similar: each table and view should come across from the database to Tableau as narrow as possible.
It may seem counter intuitive not to push heavy joins down to your beefy database but bringing the data piecemeal across to Tableau allows the program to use some of its own parallel processing during the extract. That means drag and drop tables individually and define the joins in the Tableau Desktop UI. For complex joins, you should still try to write expressions for the joins in Tableau. Only make limited exceptions for very complex joins. Then, don’t oversteer off the road into the giant custom SQL query. Instead, break up smaller complex subqueries into several data warehouse views so that each bubble you drag into Tableau represents tall and narrow data, with as few columns as possible.
For more information on using multiple tables to improve extract performance, see these blog posts or contact us: