Handling Incremental Load

Use this page to learn more about the Data Mart source type Incremental, i.e. where a snapshot table is incrementally updated depending on changes in referenced tables. The incremental load means that only new and modified records will be transferred.

Incremental loading makes it possible to schedule a refresh task that runs rather frequently, thus keeping the target tables up-to-date, not instantly but on a regular basis.

In the target table most of the attributes are calculated and ready to use, meaning that no costly calculations have to be performed once the incremental refresh is ready. This Data Mart source is equal to Materialized Views in this respect, i.e. the target table contains calculated/evaluated values. The difference is that a Materialized View only supports full/complete refresh while the incremental target table is updated with small portions of data.

More information about incremental load can be found here:

Note: The Data Mart specific features and functionality handles all types of Data Mart sources. The incremental framework is however purpose built to handle incremental sources only.

Note: The incremental framework is as accurate as it is possible to describe dependences to referenced tables. Still there can be cases that can not be handled!

 

Contents

 

Entities Supporting Incremental Load

The page Entity Info - Incremental Load can be use to view entities, facts and/or dimensions, that support incremental load.

This overview page contains the following:

Column Description
Entity ID Identity of dimension or fact supporting incremental load
Fact Entity Defines if entity is a fact
Active Defines if the incremental entity is marked as active
Refresh In Progress Defines if refresh is ongoing
Supports Delete Defines if the entity supports delete, i.e. if it is possible to also remove transactions from the incremental table that are no longer available in the referenced source tables.
Last Refresh Date Timestamp for last refresh
Refresh Mode Can heave on of the following values:
  • Full

    Each refresh will be a full/complete refresh. If a criteria identity has been defined for the entity, it means that the refresh if full but considering filters as defined by the criteria conditions.

  • Incremental

    Each load will be performed incrementally, i.e. only updating the Incremental Table with new or updated records.

Refresh Status Status
  • UNUSABLE

    The entity has not been activated, i.e. the first snapshot has not been created

  • STALE

    The entity is not up-to-date, i.e. some of the referenced sources have changed

  • FRESH

    The entity is up-to-date

Navigation can be made to the detail page Entity Info - Incremental Load Detail.

The detail page has the following information:

Column Description
Entity ID Identity of dimension or fact supporting incremental load
Fact Entity Defines if entity is a fact
Active Defines if the incremental entity is marked as active
Refresh In Progress Defines if refresh is ongoing
Supports Delete Defines if the entity supports delete, i.e. if it is possible to also remove transactions from the incremental table that are no longer available in the referenced source tables.
Refresh Status Status
  • UNUSABLE

    The entity has not been activated, i.e. the first snapshot has not been created

  • STALE

    The entity is not up-to-date, i.e. some of the referenced sources have changed

  • FRESH

    The entity is up-to-date

Refresh Mode Can heave on of the following values:
  • Full

    Each refresh will be a full/complete refresh. If a criteria identity has been defined for the entity, it means that the refresh if full but considering filters as defined by the criteria conditions.

  • Incremental

    Each load will be performed incrementally, i.e. only updating the Incremental Table with new or updated records.

Last Refresh Date Timestamp for last refresh
Storage Information
Incremental Table Name Name of the target snapshot table that is incrementally updated
Base View Name Name of base view that defines how to fetch attributes for a given entity
Source View Name Name of the source view that as used when fetching data from the base view to the incremental table.
Created Date Column Name Name of the column that stores the created date (timestamp) for each transaction in the incremental table
Identity Column Name Name of the column in the incremental table that represents a unique key
Refresh by Criteria
Criteria ID Identity if a criteria definition that can be used to filter out transactions to be transferred to the incremental table
Incremental Fetch Statement
  The statement used by the framework when doing the fetch of new/modified records to the incremental table
Notes
  Contains specific information about the entity (if at all)

 

Available Operations

The entity pages support the following commands:

 

 

Criteria Definition for Incremental Entities

The page Entity Criteria - Incremental Load Detail supports definition of filter conditions, associated to a specific criteria identity, that can be used to filter transactions to be transferred to the incremental table.

A criteria can be used in two ways:

  1. Use the field Criteria ID for a specific entity in the Entity Info - Incremental Load Detail page.

    This means that the source view will be modified to also consider the conditions defined for the specified criteria.

  2. Performing a refresh of the entity based on a criteria.

    This possibility is available as a command Refresh Using Criteria in either the Entity Info - Incremental Load page or the Entity Info - Incremental Load Detail page. The main purpose is to perform a sub-set refresh, where only a smaller part of all transactions are transferred to the incremental table.

Note: It is not possible to modify a criteria definition unless the state has been modified to In Progress

Column Description
Entity ID Identity of dimension or fact supporting incremental load
Fact Entity Defines if entity is a fact
Criteria ID Identity of the criteria
State Current state
  • In progress

    The criteria is only editable for this state

  • Ready

    The criteria definition is done, no editing allowed

--- Details ---  
Condition No Sequence number of defined conditions
Column Name Is an Expression Selected if the Column Name contains an expression and not a table column name.
One example is
EXTRACT(YEAR FROM DATE_ENTERED)
This is an expression based on the table column DATE_ENTERED
Column Name Name of table column that the condition applies to
Data Type Data type, one of Number, String, Date
Operator Operator used for the condition
Condition Value Value to be used for the condition

 

Log of Incremental Refresh

The page Log - Incremental Load logs all steps performed during incremental load.

Column Description
Log ID Identity. All steps during incremental of one entity gets the same log identity
Line Number Log step number
Entity ID Identity of entity that the log applies to
Label Log label
Application Message Information
Number of Rows Log message for current label
Time (seconds) Time used in seconds, typically for INSERT, DELETE and CREATE operations
User Sign Identity of user performing the incremental load
Last Updated Timestamp when the log step occurred

The log can be used to investigate what happens during incremental load.

Note: By default the log is disabled. To enable it, either open the Business Reporter & Analysis - System Parameters page and set the value for parameter Log incremental load actions to TRUE or use the global command Enable under Incremental Load Logging.

The following command are available:

  1. Remove All Log Entries

    This option clears the log completely. Available on global level.

  2. Remove All Log Entries for Selected Log ID

    This option can be used to remove all log lines related to one specific log identity.

There is a scheduled task that cleans up the log once a week.

Refresh Info for Incremental Entities

The page Refresh Info - Incremental Load lists all Materialized Views used to track changes in referenced source tables.

Column Description
Name of MV Tracking Source Changes Identity if dimension or fact supporting incremental load
Last Max Change Date Timestamp for last refresh
Refresh Status Status of the Materialized View
  • UNUSABLE

    The MV has not been activated. As soon a refresh of the entity is performed the MV will be automatically activated.

  • STALE

    The source table has changed

  • FRESH

    The source table has not changed since last refresh

Available Commands

The page support the following commands

Deleted Rows Archive

During the incremental processing, the last step is to remove non-existing rows in the incremental table (MVT) that no longer exists in the core tables. This is only done for entities that support delete, i.e. Fact/Dimension entities that refer to core entities where rows might get removed as part of ordinary functionality.

When the incremental framework deletes rows the following happens:

  1. Identities of rows in the incremental table (MVT) that no longer exist in the MVB view, will be stored in an archive table.
  2. Identities in the archive table will be removed if they exist in the incremental table (MVT) i.e., due  to that previously removed rows have been inserted again.
  3. Stored identities in the archive table are used to remove rows in the incremental table (MVT).

The deleted rows archive table is mainly there to support the Analysis Models i.e., it provides a way to transfer information about deleted core rows to SQL Server to ensure that the DW can be kept in synch with core.

If core transactions are deleted but never replaced, the removed identities will be kept in the archive table. The only way to clean up the stored information is to use the Clear Incremental Load Deleted Rows Archive assistant.

Use the assistant if it has been made sure that archive identities have been considered in the Analysis Models related DW.

The assistant provides two delete options:

  1. Delete archived rows individually per entity.
  2. Delete all archived rows regardless of entity.

To use the first option, enable editing in the Deletion Options group for each entity for which archived rows should be removed:

  1. Indicate that rows should be removed by using the Delete Rows column drop-down.
  2. Select if all archived rows should be removed by using Delete All Rows or specify that archived rows with a rowversion prior to the Delete Rows Prior To date should be removed.
  3. Save changes
  4. Click on Delete Rows.

 

To use the second option, toggle Delete All and then click on Delete Rows.

 

Configuration Guidelines

Refresh Scenarios

To get started with incremental loading requires that a first refresh is done, with the purpose to create a first snapshot. If Refresh Mode is Full then all data will be transferred for each refresh. An attached criteria definition will be considered and the purpose to use one would be to limit the amount of data transferred. This option is a non-incremental option and should only be used if there are some flaws with the incremental loading.

The standard Refresh Mode is Incremental. When the first snapshot is created, using the Activate command, then all data will be transferred to the target table. If a criteria definition is attached to the entity then it will be considered during the initial load but also for all subsequent refresh events. As soon as the first snapshot has been successfully created, then subsequent loads will be handled incrementally.

If the entity represents a large current data volume, the initial load can take a long time. For sources with large volumes it is recommended to do the initial load in steps, a sub-set load. The following can serve as a guideline:

  1. Start by investing the source volumes. Try to find a way to divide the load in chunks. Assume that there is data from 2010 to 2021. What data is really needed when it comes to analysis? Assume that we find out that only data from 2016 to 2021 is of interest. If the data volume per year seems reasonable, a year based load is a good option.
  2. Now create necessary criteria definitions for the entity. This is done in the Entity Criteria - Incremental Load page. Assume that we create only one criteria definition. For e.g. FACT_GL_TRANSACTION the column to use is ACCOUNTING_YEAR_KEY, the operator is "=" and the first conditional value is 2016. Make sure that the criteria is in state Ready .
  3. Go to the Entity Criteria - Incremental Load page.
    1. Select the entity.
    2. Use the command Refresh Using Criteria .
    3. Choose the criteria in the drop-down and click OK to start the first transfer.
    4. When the refresh is ready, data for 2016 has been transferred. The Staleness will be Stale.
    5. Now continue with the other years. First modify the criteria definition and the run the refresh.
    6. When all years have been transferred, the initial data load is almost completed.
  4. Next we need to make sure that the Staleness of the entity is correct.
    1. For this purpose, in page Entity Criteria - Incremental Load create a new criteria definition that has a condition that will guarantee that 0 rows are transferred, e.g. year = 2040.
    2. In the Entity Info - Incremental Load Detail page, attach the criteria to the entity in the Refresh by Criteria group.
    3. Run the refresh from the detail page using the Refresh command.
    4. When done the Staleness normally goes to Fresh. It might also end up in Stale state if someone has added new transactions.
  5. Final actions
    1. Remove all criteria definitions that are not needed anymore.
    2. Remove the criteria attached to the entity in the Refresh by Criteria group.
    3. Create a criteria with a condition like year >= 2016 and attach it to the entity in the Refresh by Criteria group. This criteria makes sure that no transactions from years previous to 2016 are transferred.
    4. Next refresh will be an incremental refresh. It can be executed manually or as apart of a scheduled refresh.