Handling Incremental Load

Use this page to learn more about the Data Mart source type Incremental, i.e. where a snapshot table is incrementally updated depending on changes in referenced tables. The incremental load means that only new and modified records will be transferred.

Incremental loading makes it possible to schedule a refresh task that runs rather frequently, thus keeping the target tables up-to-date, not instantly but on a regular basis.

In the target table most of the attributes are calculated and ready to use, meaning that no costly calculations have to be performed once the incremental refresh is ready. This Data Mart source is equal to Materialized Views in this respect, i.e. the target table contains calculated/evaluated values. The difference is that a Materialized View only supports full/complete refresh while the incremental target table is updated with small portions of data.

More information about incremental load can be found here:

Note: The Data Mart specific features and functionality handles all types of Data Mart sources. The incremental framework is however purpose built to handle incremental sources only.

Note: The incremental framework is as accurate as it is possible to describe dependences to referenced tables. Still there can be cases that can not be handled!

 

Contents

 

Entities Supporting Incremental Load

The form Entity Info - Incremental Load in IEE client can be use to view entities, facts and/or dimensions, that support incremental load.

This overview form contains the following:

Column Description
Entity ID Identity of dimension or fact supporting incremental load
Fact Entity Selected if entity is a fact
Active Selected if the incremental entity is marked as active
Last Refresh Date Timestamp for last refresh
Refresh Status Status
  • UNUSABLE

    The entity has not been activated, i.e. the first snapshot has not been created

  • STALE

    The entity is not up-to-date, i.e. some of the referenced sources have changed

  • FRESH

    The entity is up-to-date

Refresh In Progress Selected if refresh is ongoing
Refresh Mode Can heave on of the following values:
  • Full

    Each refresh will be a full/complete refresh. If a criteria identity has been defined for the entity, it means that the refresh if full but considering filters as defined by the criteria conditions.

  • Incremental

    Each load will be performed incrementally, i.e. only updating the Incremental Table with new or updated records.

Incremental Table Name Name of the target snapshot table that is incrementally updated
Base View Name Name of base view that defines how to fetch attributes for a given entity
Source View Name Name of the source view that as used when fetching data from the base view to the incremental table.
Created Date Column Name Name of the column that stores the created date (timestamp) for each transaction in the incremental table
Identity Column Name Name of the column in the incremental table that represents a unique key
Criteria ID Identity if a criteria definition that will affect the source view
Supports Delete Selected if the entity supports delete, i.e. if it is possible to also remove transactions from the incremental table that are no longer available in the referenced source tables.
Incremental Fetch Statement The statement used by the framework when doing the fetch of new/modified records to the incremental table
Notes Can be used to add some specific information about the entity

 

Available Operations

The form support the following RMB options

 

 

Criteria Definition for Incremental Entities

The form Entity Criteria - Incremental Load supports definition of filter conditions, belonging to a specific criteria identity, that can be used in order to control the transactions moved via the base view to the incremental table.

A criteria can be used in two ways:

  1. Defined in the column Criteria ID for a specific entity in the Entity Info - Incremental Load form.

    This means that the source view will be modified to also consider the conditions defined for the specified criteria.

  2. Performing a refresh of the entity based on a criteria.

    This possibility is available as a RMB option in the Entity Info - Incremental Load form. The main purpose is to be able to do a so called sub-set refresh, where only a smaller part of all transactions are transferred to the incremental table.

Note: It is not possible to modify a criteria definition unless the states has been modified to In Progress

Column Description
Entity ID Identity of dimension or fact supporting incremental load
Fact Entity Selected if entity is a fact
Criteria ID Identity of the criteria
State Current state
  • In progress

    The criteria is only editable for this state

  • Ready

    The criteria definition is done, no editing allowed

--- Details ---  
Condition No Sequence number of defined conditions
Column Name Is an Expression Selected if the Column Name contains an expression and not a table column name.
One example is
EXTRACT(YEAR FROM DATE_ENTERED)
This is an expression based on the table column DATE_ENTERED
Column Name Name of table column that the condition applies to
Data Type Data type, one of Number, String, Date
Operator Operator used for the condition
Condition Value Value to be used for the condition

 

Log of Incremental Refresh

The form Log - Incremental Load logs all steps performed during incremental load.

Column Description
Log ID Identity. All steps during incremental of one entity gets the same log identity
Line Number Log step number
Entity ID Identity of entity that the log applies to
Label Log label
Application Message Information
Number of Rows Log message for current label
Time (seconds) Time used in seconds, typically for INSERT, DELETE and CREATE operations
User Sign Identity of user performing the incremental load
Last Updated Timestamp when the log step occurred

The log can be used to investigate what happens during incremental load.

Note: By default the log is disabled. To enable it, open the Business Reporter & Analysis - System Parameters window and set the value for parameter Log incremental load actions to TRUE

There are two RMB options

  1. Remove All Log Entries

    This option clears the log completely

  2. Remove All Log Entries for Selected Log ID

    This option can be used to remove all log lines related to one specific log identity

There is a scheduled task that cleans up the log once a week.

Refresh Info for Incremental Entities

The form Refresh Info - Incremental Load lists all Materialized Views used to track changes in referenced source tables.

Column Description
Name of MV Tracking Source Changes Identity if dimension or fact supporting incremental load
Last Max Change Date Timestamp for last refresh
Refresh Status Status of the Materialized View
  • UNUSABLE

    The MV has not been activated. As soon a refresh of the entity is performed the MV will be automatically activated.

  • STALE

    The source table has changed

  • FRESH

    The source table has not changed since last refresh

Available Operations

The form support the following RMB options:

 

Configuration Guidelines

Refresh Scenarios

To get started with incremental loading requires that a first refresh is done, with the purpose to create a first snapshot. If Refresh Mode is Full then all data will be transferred for each refresh. An attached criteria definition will be considered and the purpose to use one would be to limit the amount of data transferred. This option is a non-incremental option and should only be used if there are some flaws with the incremental loading.

The standard Refresh Mode is Incremental. When the first snapshot is created, using RMB Activate, then all data will be transferred to the target table. If a criteria definition is attached to the entity then it will be considered during the initial load but also for all subsequent refresh events. As soon as the first snapshot has been successfully created, then subsequent loads will be handled incrementally.

If the entity represents a large current data volume, the initial load can take a long time. For sources with large volumes it is recommended to do the initial load in steps, a sub-set load. The following can serve as a guideline:

  1. Start by investing the source volumes. Try to find a way to divide the load in chunks. Assume that there is data from 2010 to 2021. What data is really needed when it comes to analysis? Assume that we find out that only data from 2016 to 2021 is of interest. If the data volume per year seems reasonable, a year based load is a good option.
  2. Now create necessary criteria definitions for the entity. This is done in Entity Criteria - Incremental Load. Assume that we create only one criteria definition. For e.g. FACT_GL_TRANSACTION the column to use is ACCOUNTING_YEAR_KEY, the operator is "=" and the first conditional value is 2016. Make sure that the criteria is in state Ready .
  3. Go to Entity Criteria - Incremental Load.
    1. Select the entity.
    2. Use the RMB Refresh Using Criteria .
    3. Choose the criteria in the drop-down and click OK to start the first transfer.
    4. When the refresh is ready, data for 2016 has been transferred. The Staleness will be Stale.
    5. Now continue with the other years. First modify the criteria definition and the run the refresh.
    6. When all years have been transferred, the initial data load is almost completed.
  4. Next we need to make sure that the Staleness of the entity is correct.
    1. For this purpose, in Entity Criteria - Incremental Load create a new criteria definition that has a condition that will guarantee that 0 rows are transferred, e.g. year = 2040.
    2. In Entity Info - Incremental Load, attach the criteria to the entity in the Criteria ID column.
    3. Perform refresh using RMB Refresh.
    4. When done the Staleness normally goes to Fresh. It might also end up in Stale state if someone has added new transactions.
  5. Final actions
    1. Remove all criteria definitions that are not needed anymore.
    2. Remove the criteria attached to the entity in the Criteria ID column.
    3. Create a criteria with a condition like year >= 2016 and attach it to the entity in the Criteria ID column. This criteria makes sure that no transactions from years previous to 2016 are transferred.
    4. Next refresh will be an incremental refresh. It can be executed manually or as apart of a scheduled refresh.