Incremental Refresh¶
During the Incremental refresh, only the ‘Changed’ and ‘New’ partitions are loaded into the data lake. If a Partition has changed, then it will be loaded/overwritten in the data lake. If a Partition is new, then the parquet file will be created.
In a scenario where there is a partition in which no data is available anymore (which was Loaded and Refreshed previously), data of the related partition will be removed from the lake and the data set.
How it works:¶
- During round 1 of Loading /Refreshing, partition A (YYYYMM) data is Loaded and Refreshed. And a .parquet file for partition A (YYYYMM) is available in the Power BI data set for the related YYYYMM.
- In round 2 when partition A (YYYYMM) has no data in Oracle anymore, during Loading/Refreshing, this partition will be removed in the Power BI dataset.
- This is done by passing the partitions to be deleted, as detected by the Detect Changes Logic, which removes the .parquet file for the partition A (YYYYMM).
Configure Power BI Table when using Partitioning
During the creation of a Parquet Data Source, incremental Load details are specified.
When incremental refresh is turned on (Partition Column has a value) this column contains _{0}. It is a placeholder where the value of the partition column is placed.
For the Incremental Refresh, the below formats are supported, and the example shows how the data must be formatted in the partition column.
Granularity | Format in table | Parquet file name | Notation | Power BI Partition Name |
---|---|---|---|---|
Year | 2023 | *YYYY | YYYY | 2023 |
Quarter | 2023Q4 | *YYYY-QQ | yyyy-QQ | 2023Q4 |
Month | 2023-11 | *YYYY-MM | yyyy-MM | 2023Q411 |
Day | 2023-11-25 | *YYYY-MM-DD | YYYY-MM-dd | 2023Q41125 |
The Last Value for a partition is stored in the database. This is determined from the Detect Changes Column. A query like below is executed against the database:
SELECT period_id_key AS partition_key, MAX(last_updated_date) AS last_value, count(*) as row_count
FROM <table_name>
GROUP BY period_id_key
Based on the outcome of the previous query, it is determined whether the last_value has changed, or the row_count has changed (Row Count is taken into account when the ‘Detect Changes Row Count’ is set to true).
Below is a sample query of an Incremental refreshing model's partition format [datasourcename]_YYYYMMDD.parquet.
Parameters¶
Parameters below can be defined in the Manage Parameters dialog in Power Query Editor.
Parameter | Details |
---|---|
RangeStart RangeEnd |
During the Incremental Data Refresh, these two parameters enable defining the date range of the data to be loaded into the Power BI Desktop model. The range parameters must be used in the m-expression, therefore values can be set accordingly. It is not a must to have values for each of these parameters during Upload/ Publish into Power BI. If only one range parameter is used, this will not indicate any errors when refreshing via Power BI Desktop. But refreshing via the Analysis Models page will throw an error. Additionally, in Power BI desktop, it is important to set the values for RangeStart and RangeEnd parameters correctly based on actual values (parquet files for the given range should be available). Otherwise, it will indicate that the table is empty. |
IfsDataLakeContainerPath | Connection string to the relevant container inside the ADLSG2. Example: https://examplestorageaccount.dfs.core.windows.net/container_name |
Transform File function¶
This Power Query function is used to apply consistent transformations to multiple files.
The Transform File function can be created as below.
1.In the Query window, go to 'New Source'.
2.Select 'Blank Query'.
3.Open 'Advanced Editor'.
4.Add the following m-expression.
= (Parameter1) => let
Source = Parquet.Document(Parameter1, [Compression=null, LegacyColumnNameEncoding=false, MaxDepth=null])
in
Source
Examples of Incremental Load¶
Example 1 - Scenario with Row Count¶
Incremental Loading scenario for the fact= FACT_ABSENCE_PERIOD (note that the column names are hypothetical)
- Partition column = month_key
Since the group by function is added automatically to this query, this fact table is grouped by the month_key.
-
Detect Changes column = entry_date
-
Detect Changes function = max
-
Row count included
SELECT month_key as partition_key, max(entry_date) as last_value , count(*) as row_count
FROM FACT_ABSENCE_PERIOD
GROUP BY month_key
Example 2¶
Incremental Loading scenario for the fact= FACT_ABSENCE_PERIOD (note that the column names are hypothetical)
-
Partition column = month_key
-
Detect Changes column = amount
-
Detect Changes function = sum
-
Row count excluded
SELECT month_key as partition_key, sum(amount) as last_value
FROM FACT_ABSENCE_PERIOD
GROUP BY month_key
Power Bi M-expression examples for Partitioning¶
Loading partition data based on the Day¶
let
Source = AzureStorage.DataLake(IfsDataLakeContainerPath & "/Shared/content/FACT_AM_PARQ_LOG_TAB"),
FirstFileName = "FACT_AM_PARQ_LOG_TAB_" & DateTime.ToText(RangeStart, "yyyy-MM-dd") & ".parquet",
LastFileName = "FACT_AM_PARQ_LOG_TAB_" & DateTime.ToText(RangeEnd, "yyyy-MM-dd") & ".parquet",
IncRefreshFilter = Table.SelectRows(Source, each [Name] >= FirstFileName and [Name] <= LastFileName),
RemoveIncRefresh = Table.SelectColumns(IncRefreshFilter, {"Content", "Name"}),
FilteredHiddenFiles1 = Table.SelectRows(RemoveIncRefresh, each [Attributes]?[Hidden]? <> true),
InvokeCustomFunction1 = Table.AddColumn(FilteredHiddenFiles1, "Transform File", each #"Transform File"([Content])),
RenamedColumns1 = Table.RenameColumns(InvokeCustomFunction1, {"Name", "Source.Name"}),
RemovedOtherColumns1 = Table.SelectColumns(RenamedColumns1, {"Source.Name", "Transform File"}),
ExpandedTableColumn1 = Table.Combine(RemovedOtherColumns1[#"Transform File"])
in
ExpandedTableColumn1
Sample .pbit file with M-expression for Partitioning (Partitioned Days) can be downloaded from here for further reference.
Loading Quarterly based partition data¶
let
Source = AzureStorage.DataLake(IfsDataLakeContainerPath & "/Finance/content/AM_PARQ_FACT_QUARTERS_TEST"),
FirstFileName = "AM_PARQ_FACT_QUARTERS_TEST_" & DateTime.ToText(RangeStart, "yyyy") & "Q" & Number.ToText(Date.QuarterOfYear(RangeStart)) & ".parquet",
LastFileName = "AM_PARQ_FACT_QUARTERS_TEST_" & DateTime.ToText(RangeEnd, "yyyy") & "Q" & Number.ToText(Date.QuarterOfYear(RangeEnd)) & ".parquet",
IncRefreshFilter = Table.SelectRows(Source, each [Name] >= FirstFileName and [Name] <= LastFileName),
RemoveIncRefresh = Table.SelectColumns(IncRefreshFilter, {"Content", "Name"}),
FilteredHiddenFiles1 = Table.SelectRows(RemoveIncRefresh, each [Attributes]?[Hidden]? <> true),
InvokeCustomFunction1 = Table.AddColumn(FilteredHiddenFiles1, "Transform File", each #"Transform File"([Content])),
RenamedColumns1 = Table.RenameColumns(InvokeCustomFunction1, {"Name", "Source.Name"}),
RemovedOtherColumns1 = Table.SelectColumns(RenamedColumns1, {"Source.Name", "Transform File"}),
ExpandedTableColumn1 = Table.Combine(RemovedOtherColumns1[#"Transform File"])
in
ExpandedTableColumn1
Sample .pbit file with M-expression for Partitioning (Partitioned Quarters) can be downloaded from here for further reference.