sábado, 21 de março de 2015

Tableau Data Extract #1 - Aggregation

 Why to use the Aggregation option?

If you decide to extract the data instead of use live connection, there is an important option, called Aggregation, which can save disk space, reduce extract time and improve your visualization performance.
When you use Aggregation option, you are telling Tableau to perform an aggregated extract, which means that all the measures will be summarized (avg, min, max, depending of what you choose as default aggregation) by the visibles dimensions.

In a simple way, if you do not choose to use aggregation, this will be the query used to extract the data:

SELECT 
DIMENSION_1, 
DIMENSION_2, 
...DIMENSION_N, 
MEASURE_1, 
MEASURE_2, 
...MEASURE_N
FROM TABLE

But, if you choose to use the aggregation, this will be the query:

SELECT 
DIMENSION_1, 
DIMENSION_2, 
...DIMENSION_N, 
SUM|MAX|MIN|AVG(MEASURE_1), 
SUM|MAX|MIN|AVG(MEASURE_2),
...SUM|MAX|MIN|AVG(MEASURE_N)
FROM TABLE
GROUP BY DIMENSION1, DIMENSION_2, ...DIMENSION_N

Example

We will extract the below table (sales table), which has 31 records and 2 fields: Employee and Sales.
Each employee may have more than one record.
Our extract will need both fields, and we will perform the extract with and without the aggregation option.
#1 - Not using the aggregation option.

The process extracted 31 rows (all the table rows).
Query used:
{....
"cols":3,"query":"SELECT 1 AS \"number of records\",\n  \"sales\".\"employee\" AS \"employee\",\n  \"sales\".\"sales\" AS \"sales\"\nFROM \"financial\".\"sales\" \"sales\"","rows":31,"elapsed":0.339}
}

#2 - Using the aggregation option
The process extracted only 16 rows.
Query used:
{...
"cols":3,"query":"SELECT \"sales\".\"employee\" AS \"employee\",\n  SUM(CAST(1 AS BIGINT)) AS \"number of records\",\n  SUM(\"sales\".\"sales\") AS \"sales\"\nFROM \"financial\".\"sales\" \"sales\"\nGROUP BY 1","rows":16,"elapsed":0.149}
}


In that example we used a small table with only 31 records.

Probably, in the real world, your tables may have millions or even billions of rows. If that is your case, you should consider to use that option.

But be aware: when you aggregate your measures, you will not be able to create calculations in the most granular level of data anymore.
In other words, your row level calculation will be different.

For example, if you have an extract without aggregation you can perform both calculations:
[Sales]/[Profit] and SUM([Sales])/SUM([Profit]) - which will give you different results.

When you use an aggregated extract, you will no longer have the granular data, so didactically, you will no longer be able to perfom [Sales]/[Profit], just SUM([Sales])/SUM([Profit]).


For more information, visit Tableau KB: Aggregated Extracts.

Nenhum comentário:

Postar um comentário