Skip to content

Incremental Refresh

During the Incremental refresh, only the ‘Changed’ and ‘New’ partitions are loaded into the data lake. If a Partition has changed, then it will be loaded/overwritten in the data lake. If a Partition is new, then the parquet file will be created.

In a scenario where there is a partition in which no data is available anymore (which was Loaded and Refreshed previously), data of the related partition will be removed from the lake and the data set.

How it works:

  • During round 1 of Loading /Refreshing, partition A (YYYYMM) data is Loaded and Refreshed. And a .parquet file for partition A (YYYYMM) is available in the Power BI data set for the related YYYYMM.
  • In round 2 when partition A (YYYYMM) has no data in Oracle anymore, during Loading/Refreshing, this partition will be removed in the Power BI dataset.
  • This is done by passing the partitions to be deleted, as detected by the Detect Changes Logic, which removes the .parquet file for the partition A (YYYYMM).

Configure Power BI Table when using Partitioning

During the creation of a Parquet Data Source, incremental Load details are specified.

When incremental refresh is turned on (Partition Column has a value) this column contains _{0}. It is a placeholder where the value of the partition column is placed.

For the Incremental Refresh, the below formats are supported, and the example shows how the data must be formatted in the partition column.

Granularity Format in table Parquet file name Notation Power BI Partition Name
Year 2023 *YYYY YYYY 2023
Quarter 2023Q4 *YYYY-QQ yyyy-QQ 2023Q4
Month 2023-11 *YYYY-MM yyyy-MM 2023Q411
Day 2023-11-25 *YYYY-MM-DD YYYY-MM-dd 2023Q41125

The Last Value for a partition is stored in the database. This is determined from the Detect Changes Column. A query like below is executed against the database:

SELECT period_id_key AS partition_key, MAX(last_updated_date) AS last_value, count(*) as row_count
FROM <table_name>
GROUP BY period_id_key

Based on the outcome of the previous query, it is determined whether the last_value has changed, or the row_count has changed (Row Count is taken into account when the ‘Detect Changes Row Count’ is set to true).

Below is a sample query of an Incremental refreshing model's partition format [datasourcename]_YYYYMMDD.parquet.

Parameters

Parameters below can be defined in the Manage Parameters dialog in Power Query Editor.

Parameter Details
RangeStart
RangeEnd
During the Incremental Data Refresh, these two parameters enable defining the date range of the data to be loaded into the Power BI Desktop model.
The range parameters must be used in the m-expression, therefore values can be set accordingly. It is not a must to have values for each of these parameters during Upload/ Publish into Power BI.
If only one range parameter is used, this will not indicate any errors when refreshing via Power BI Desktop. But refreshing via the Analysis Models page will throw an error.
Additionally, in Power BI desktop, it is important to set the values for RangeStart and RangeEnd parameters correctly based on actual values (parquet files for the given range should be available). Otherwise, it will indicate that the table is empty.
IfsDataLakeContainerPath Connection string to the relevant container inside the ADLSG2.
Example: https://examplestorageaccount.dfs.core.windows.net/container_name
Transform File function

This Power Query function is used to apply consistent transformations to multiple files.

The Transform File function can be created as below.

1.In the Query window, go to 'New Source'.

2.Select 'Blank Query'.

3.Open 'Advanced Editor'.

4.Add the following m-expression.

= (Parameter1) => let
        Source = Parquet.Document(Parameter1, [Compression=null, LegacyColumnNameEncoding=false, MaxDepth=null])
    in
        Source

Examples of Incremental Load

Example 1 - Scenario with Row Count

Incremental Loading scenario for the fact= FACT_ABSENCE_PERIOD (note that the column names are hypothetical)

  1. Partition column = month_key

Since the group by function is added automatically to this query, this fact table is grouped by the month_key.

  1. Detect Changes column = entry_date

  2. Detect Changes function = max

  3. Row count included

SELECT month_key as partition_key, max(entry_date) as last_value , count(*) as row_count 
FROM FACT_ABSENCE_PERIOD
GROUP BY month_key

Example 2

Incremental Loading scenario for the fact= FACT_ABSENCE_PERIOD (note that the column names are hypothetical)

  1. Partition column = month_key

  2. Detect Changes column = amount

  3. Detect Changes function = sum

  4. Row count excluded

SELECT month_key as partition_key, sum(amount) as last_value
FROM FACT_ABSENCE_PERIOD
GROUP BY month_key

Power Bi M-expression examples for Partitioning

Loading partition data based on the Day

let
    Source = AzureStorage.DataLake(IfsDataLakeContainerPath & "/Shared/content/FACT_AM_PARQ_LOG_TAB"),

    FirstFileName = "FACT_AM_PARQ_LOG_TAB_" & DateTime.ToText(RangeStart, "yyyy-MM-dd") & ".parquet",
    LastFileName = "FACT_AM_PARQ_LOG_TAB_" & DateTime.ToText(RangeEnd, "yyyy-MM-dd") & ".parquet",
    IncRefreshFilter = Table.SelectRows(Source, each [Name] >= FirstFileName and [Name] <= LastFileName),

    RemoveIncRefresh = Table.SelectColumns(IncRefreshFilter, {"Content", "Name"}),

    FilteredHiddenFiles1 = Table.SelectRows(RemoveIncRefresh, each [Attributes]?[Hidden]? <> true),
    InvokeCustomFunction1 = Table.AddColumn(FilteredHiddenFiles1, "Transform File", each #"Transform File"([Content])),
    RenamedColumns1 = Table.RenameColumns(InvokeCustomFunction1, {"Name", "Source.Name"}),
    RemovedOtherColumns1 = Table.SelectColumns(RenamedColumns1, {"Source.Name", "Transform File"}),    
    ExpandedTableColumn1 = Table.Combine(RemovedOtherColumns1[#"Transform File"])
in
    ExpandedTableColumn1

Sample .pbit file with M-expression for Partitioning (Partitioned Days) can be downloaded from here for further reference.

Loading Quarterly based partition data

let    
    Source = AzureStorage.DataLake(IfsDataLakeContainerPath & "/Finance/content/AM_PARQ_FACT_QUARTERS_TEST"),

    FirstFileName = "AM_PARQ_FACT_QUARTERS_TEST_" & DateTime.ToText(RangeStart, "yyyy") & "Q" & Number.ToText(Date.QuarterOfYear(RangeStart)) & ".parquet",
    LastFileName = "AM_PARQ_FACT_QUARTERS_TEST_" & DateTime.ToText(RangeEnd, "yyyy") & "Q" & Number.ToText(Date.QuarterOfYear(RangeEnd)) & ".parquet",

    IncRefreshFilter = Table.SelectRows(Source, each [Name] >= FirstFileName and [Name] <= LastFileName),
    RemoveIncRefresh = Table.SelectColumns(IncRefreshFilter, {"Content", "Name"}),

    FilteredHiddenFiles1 = Table.SelectRows(RemoveIncRefresh, each [Attributes]?[Hidden]? <> true),
    InvokeCustomFunction1 = Table.AddColumn(FilteredHiddenFiles1, "Transform File", each #"Transform File"([Content])),
    RenamedColumns1 = Table.RenameColumns(InvokeCustomFunction1, {"Name", "Source.Name"}),
    RemovedOtherColumns1 = Table.SelectColumns(RenamedColumns1, {"Source.Name", "Transform File"}),   
    ExpandedTableColumn1 = Table.Combine(RemovedOtherColumns1[#"Transform File"])
in
    ExpandedTableColumn1

Sample .pbit file with M-expression for Partitioning (Partitioned Quarters) can be downloaded from here for further reference.