Additional Settings
Besides the filter, aggregation, and functions in Formula Editor, there are also three additional functionalities available in Dataset Builder. These three functions "Show data", "LOCF" and "AutoRefresh" are described in detail in the sections below.
The Column only for filtering - "Show data"
In the same section as aggregation and grouping functions also the setting Show data can be found. If you are disabling this option all defined filters and aggregations defined for this column are applied to the query but the resulting data for this column will not be returned.
Filtering data instead of the result
On SF Platform all filters defined in a dataset are applied to the result of the query (HAVING clause in SQL terms). But there is a little trick to be able to apply filters on data before grouping and aggregation (WHERE clause in SQL terms).
As you maybe already expected the Show data option is the solution. Usually, if there is a grouping or aggregation applied on one of the columns all other columns also have to apply one of these. But this doesn't hold true for columns that do not return data. So for these columns filters are applied to the result if there is any grouping or aggregation defined, or it is applied to the data itself if it is not.
If you want to have the result of a column, but you also want to filter on the data instead of the result after grouping/aggregation, then just add this column two times and use one for computation (Show data- enabled + grouping/aggregation) and one for filtering (Show data disabled).
Example
For a better understanding let's have a look at an example. In this example, we want to analyze the weekly compressed data volume inserted and the number of messages which are inserted within this month. But we want to do this analysis only for messages which contained exactly 1 event.
In the picture below you can see the dataset we prepared for this analysis. At this stage, it only includes all necessary columns and is filtered for this month.
You can see that most of the messages only include 1 event, but there are also some messages which contain many events (see row 11).
Let's apply all groupings and aggregations. These are grouping the Timestamp column by "Week", summing up all values in "Compressed size" and counting all rows in "Inserted events". The result for this you can see below.
But in this case, we did the analysis of overall messages and not only for those containing only 1 event.
If we are adding now an additional filter "= 1" to the "Inserted event" column then we do not see any result. The reason for this is that the filter is applied to the result of the aggregation. And as we can see in the previous picture none of these results (19, 2718, 3) are fulfilling this condition.
To change the dataset according to our requirements we have to add an additional column for "Inserted events". Here we have to disable the "Show data" option and do not apply any grouping or aggregation. So the filters defined for this column are applied to the data before grouping and aggregation. We remove the filter from the previous "Inserted events" column and add it to the new one.
Now we get the desired results. In this analysis now only rows with "Inserted event = 1" are used for calculating the aggregations.
Gap filling - "LOCF" (Timescale only)
Another very useful feature in our Dataset Builder is gap filling. If you are combining multiple columns (especially if they are from different events) it will happen that there are entries in the datasets which are empty.
In some cases, it makes sense to fill up these gaps within a column. e.g. if your machines only send changes of their state. Because these attributes do not change in between and are therefore not sent in a frequent manner. But for some analyses or visualisations you maybe want to have entries in each row.
To do this gap-filling you have to enable the LOCF (Last Observation Carried Forward) option in the Dataset Builder. This fills up the empty entries with the previous entry in respect of time. So sorting doesn't have any influence on the gap filling. In the picture below you can see our previous dataset using the gap-filling feature.
This feature is only available in environments using the Timescale database.
When is gap-filling applied?
One thing to know about the gap-filling in combination with filters, aggregations, and groupings is the order of applying these things to the data.
Usually, groupings and aggregations are applied first, then all filters (except you are doing the trick with disabling Show data; see Filtering data instead of results above), and then the gap-filling is applied.
Previous value
Some datapoints are not sent in a frequent manner and then it can be the case that a dataset doesn't have an initial value, or even no value for the selected timeframe. To give a solution for this problem we added the possibility to use "Previous Value". When this feature is activated for a datapoint, the first occurrence of a value, before the datasets timeframe is used.
The "Previous Value" can also be combined with gap-filling in order to have a value for each row of the dataset (provided that the datapoint ever received data).
If the selected datapoint never received data before the timeframe of the dataset "No data" will still be shown.
It is strongly suggested to use a timestamp column with a filter when using this feature!
Auto refresh in dataset editor
In the Dataset editor, you have the possibility to disable the auto-refresh behavior through the "AUTOREFRESH" toggle button. When the auto-refresh is deactivated, the data will not be reloaded when different actions are made at the dataset level like (columns, filters, or aggregations updates). But you can manually trigger a refresh from the button available at the top-right of the dataset editor . This button will be colored in red when the auto-refresh is deactivated.