Handling Incremental Load¶
Use this page to learn more about the Data Mart source type Incremental, i.e. where a snapshot table is incrementally updated depending on changes in referenced tables. The incremental load means that only new and modified records will be transferred.
Incremental loading makes it possible to schedule a refresh task that runs rather frequently, thus keeping the target tables up-to-date, not instantly but on a regular basis.
In the target table most of the attributes are calculated and ready to use, meaning that no costly calculations have to be performed once the incremental refresh is ready. This Data Mart source is equal to Materialized Views in this respect, i.e. the target table contains calculated/evaluated values. The difference is that a Materialized View only supports full/complete refresh while the incremental target table is updated with small portions of data.
More information about incremental load can be found here:
Note: The Data Mart specific features and functionality handles all types of Data Mart sources. The incremental framework is however purpose built to handle incremental sources only.
Note: The incremental framework is as accurate as it is possible to describe dependences to referenced tables. Still there can be cases that can not be handled!
Entities Supporting Incremental Load¶
The page Entity Info - Incremental Load can be use to view entities, facts and/or dimensions, that support incremental load.
This overview page contains the following:
Column | Description |
---|---|
Entity ID | Identity of dimension or fact supporting incremental load |
Fact Entity | Defines if entity is a fact |
Active | Defines if the incremental entity is marked as active |
Refresh In Progress | Defines if refresh is ongoing |
Supports Delete | Defines if the entity supports delete, i.e. if it is possible to also remove transactions from the incremental table that are no longer available in the referenced source tables. |
Last Refresh Date | Timestamp for last refresh |
Refresh Mode | Can heave on of the following values:
|
Refresh Status | Status
|
Navigation can be made to the detail page Entity Info - Incremental Load Detail.
The detail page has the following information:
Column | Description |
---|---|
Entity ID | Identity of dimension or fact supporting incremental load |
Fact Entity | Defines if entity is a fact |
Active | Defines if the incremental entity is marked as active |
Refresh In Progress | Defines if refresh is ongoing |
Supports Delete | Defines if the entity supports delete, i.e. if it is possible to also remove transactions from the incremental table that are no longer available in the referenced source tables. |
Refresh Status | Status
|
Refresh Mode | Can heave on of the following values:
|
Last Refresh Date | Timestamp for last refresh |
Storage Information | |
Incremental Table Name | Name of the target snapshot table that is incrementally updated |
Base View Name | Name of base view that defines how to fetch attributes for a given entity |
Source View Name | Name of the source view that as used when fetching data from the base view to the incremental table. |
Created Date Column Name | Name of the column that stores the created date (timestamp) for each transaction in the incremental table |
Identity Column Name | Name of the column in the incremental table that represents a unique key |
Refresh by Criteria | |
Criteria ID | Identity if a criteria definition that can be used to filter out transactions to be transferred to the incremental table |
Incremental Fetch Statement | |
The statement used by the framework when doing the fetch of new/modified records to the incremental table | |
Notes | |
Contains specific information about the entity (if at all) |
Available Operations¶
The entity pages support the following commands:
- Activate Performs the first refresh, creating the first snapshot, and marks the entity as active. Only available if Refresh Status is UNUSABLE
- Refresh performs refresh of the entity according to the Refresh Mode.
- Refresh Using Criteria supports refresh based on a criteria, i.e. a set of conditions will be used and applied on the Base View to fresh the incremental table. A typical use case is when it is believed that some transactions in the incremental table are not 100% accurate. The it is possible to define a criteria that e.g. leads to fetch of all transactions created this year. After this the ordinary incremental load can run.
- Remove Non-Existing Source Rows If the entity supports delete of records then the option Remove Non-Existing Source Rows will be available. In most cases the removal of non-existing source rows from the incremental load table is handled automatically by the incremental framework. This option can be used if there is a need to remove non-existing source rows before the a scheduled refresh is executed.
- Reset Refresh Status If the refresh fails, e.g. due to that the session for some reason is killed, an entity might still be considered as being in progress. This is visualized in the page Refresh In Progress. If it is clear that the refresh is not running, the Reset Refresh Status command can be used to reset the refresh status to make it possible to run a new refresh.
- Refresh Info Opens the Refresh Info - Incremental Load page that lists all special tracking Materialized Views (MVs) used to track changes in referenced source tables.
- Explore Data Mart Entities Navigates to the Explore By Information Source page and shows all Data Mart sources referenced by the current entity
- Recreate Source View Recreates the source view if it is expected that it is not currently correct. Do not use this command unnecessarily.
Criteria Definition for Incremental Entities¶
The page Entity Criteria - Incremental Load supports definition of filter conditions, associated to a specific criteria identity, that can be used to filter transactions to be transferred to the incremental table.
A criteria can be used in two ways:
- Use the field Criteria ID for a specific entity in the Entity Info - Incremental Load Detail page. This means that the source view will be modified to also consider the conditions defined for the specified criteria.
- Performing a refresh of the entity based on a criteria. This possibility is available as a command Refresh Using Criteria in either the Entity Info - Incremental Load page or the Entity Info - Incremental Load Detail page. The main purpose is to perform a sub-set refresh, where only a smaller part of all transactions are transferred to the incremental table.
- Start by defining a new criteria, specifying the entity and the identity of the criteria to be created.
- After saving, define the conditions, one per row in the detail table
- The State will be In Progress
- When the definitions are ready, use the command Ready for Deployment to change the state to Ready.
- If the criteria was already attached to the entity, changing the status means that the source view of the entity will be modified.
Note: It is not possible to modify a criteria definition unless the state has been modified to In Progress
Column | Description |
---|---|
Entity ID | Identity of dimension or fact supporting incremental load |
Fact Entity | Defines if entity is a fact |
Criteria ID | Identity of the criteria |
State | Current state
|
--- Details --- | |
Condition No | Sequence number of defined conditions |
Column Name Is an Expression | Selected if the Column Name contains an expression and not a table column name. One example is EXTRACT(YEAR FROM DATE_ENTERED) This is an expression based on the table column DATE_ENTERED |
Column Name | Name of table column that the condition applies to |
Data Type | Data type, one of Number, String, Date |
Operator | Operator used for the condition |
Condition Value | Value to be used for the condition |
Log of Incremental Refresh¶
The page Log - Incremental Load logs all steps performed during incremental load.
Column | Description |
---|---|
Log ID | Identity. All steps during incremental of one entity gets the same log identity |
Line Number | Log step number |
Entity ID | Identity of entity that the log applies to |
Label | Log label |
Application Message | Information |
Number of Rows | Log message for current label |
Time (seconds) | Time used in seconds, typically for INSERT, DELETE and CREATE operations |
User Sign | Identity of user performing the incremental load |
Last Updated | Timestamp when the log step occurred |
The log can be used to investigate what happens during incremental load.
Note: By default the log is disabled. To enable it, either open the Business Reporter & Analysis - System Parameters page and set the value for parameter Log incremental load actions to TRUE or use the global command Enable under Incremental Load Logging.
The following command are available:
- Remove All Log Entries This option clears the log completely. Available on global level.
- Remove All Log Entries for Selected Log ID This option can be used to remove all log lines related to one specific log identity.
There is a scheduled task that cleans up the log once a week.
Refresh Info for Incremental Entities¶
The page Refresh Info - Incremental Load lists all Materialized Views used to track changes in referenced source tables.
Column | Description |
---|---|
Name of MV Tracking Source Changes | Identity if dimension or fact supporting incremental load |
Last Max Change Date | Timestamp for last refresh |
Refresh Status | Status of the Materialized View
|
Available Commands¶
The page support the following commands
- Connected Entities Opens the Entity Info - Incremental Load page populated with the entities that are using one specific Tracking MV
Deleted Rows Archive¶
During the incremental processing, the last step is to remove non-existing rows in the incremental table (MVT) that no longer exists in the core tables. This is only done for entities that support delete i.e., Fact/Dimension entities that refer to core entities where rows might get removed as part of ordinary functionality.
When the incremental framework deletes rows the following happens:
- Identities of rows in the incremental table (MVT) that no longer exist in the MVB view, will be stored in an archive table.
- Identities in the archive table will be removed if they exist in the incremental table (MVT) i.e., due to that previously removed rows have been inserted again.
- Stored identities in the archive table are used to remove rows in the incremental table (MVT).
The deleted rows archive table is mainly there to support the Analysis Models i.e., it provides a way to transfer information about deleted core rows to SQL Server to ensure that the DW can be kept in synch with core.
If core transactions are deleted but never replaced, the removed identities will be kept in the archive table. The only way to clean up the stored information is to use the Clear Incremental Load Deleted Rows Archive assistant.
Use the assistant if it has been made sure that archive identities have been considered in the Analysis Models related DW.
The assistant provides two delete options:
- Delete archived rows individually per entity.
- Delete all archived rows regardless of entity.
To use the first delete option, enable editing in the Deletion Options group for each entity for which archived rows should be removed:
- Indicate that rows should be removed by using the Delete Rows column drop-down.
- Select if all archived rows should be removed by using Delete All Rows or specify that archived rows with a rowversion prior to the Delete Rows Prior To date should be removed.
- Save changes.
- Click on Delete Rows.
To use the second delete option, toggle Delete All and then click on Delete Rows.
Configuration Guidelines¶
- Before using an incrementally loaded entity, it must be activated. This is performed in the
- Entity Info - Incremental Load page via a command.
- Data Mart Sources page
- Figure out how incrementally loaded Data Mart fact entities are to be used
- Data Mart access via clients as e.g. IFS Business Reporter (BR)
- Serving as sources for IFS Analysis Models
- Data Mart access via BR
- Fact entities supported by incremental load.
- Dimensions entities supported by Materialized Views.
- Data Mart facts and dimensions are together defining a star schema.
- Incremental facts can be set up to refresh rather frequently, e.g. every hour or every second hour. Define the facts in a special Refresh Category and schedule the refresh of the category.
- Dimensions can be set up to be refreshed once to twice a day.
- How to set up a Refresh Category is found via this link >>
- Data Mart access via Analysis Models
- Fact entities supported by incremental load Important to use incremental load in this case to enhance faster load and refresh of the Data Warehouse
- Dimensions are normally loaded via the On Line access version. This means that refresh of dimensions does now have to be considered at all in this case.
- Incremental facts can be set up to refresh rather frequently, e.g. every or every second hour Define the facts in a special Refresh Category and schedule the refresh of the category
- How to set up a Refresh Category is found via this link >>
- The Entity Info - Incremental Load form can be used to activate and manually refresh an entity supporting incremental load. The form can also be used to perform a so called sub-set load, where an entity criteria definition is supplied with the purpose to load a sub set of all transactions, mainly due to that the incremental framework might not be able to track all changes of referenced tables. The sub-set load then gives the possibility to update all transactions meeting the defined conditions e.g., all created current year, all created the current month etc.
- A criteria definition can be added to an entity in order to make sure that the number of transferred transactions is limited. One example can be in Financials where it is of little interest to look at Balances more then a few years back. In this case a condition like
ACCOUNTING_YEAR>2011
makes sure that only transactions older or equal to 2012 are transferred.
Refresh Scenarios¶
To get started with incremental loading requires that a first refresh is done, with the purpose to create a first snapshot. If Refresh Mode is Full then all data will be transferred for each refresh. An attached criteria definition will be considered and the purpose to use one would be to limit the amount of data transferred. This option is a non-incremental option and should only be used if there are some flaws with the incremental loading.
The standard Refresh Mode is Incremental. When the first snapshot is created, using the Activate command, then all data will be transferred to the target table. If a criteria definition is attached to the entity then it will be considered during the initial load but also for all subsequent refresh events. As soon as the first snapshot has been successfully created, then subsequent loads will be handled incrementally.
If the entity represents a large current data volume, the initial load can take a long time. For sources with large volumes it is recommended to do the initial load in steps, a sub-set load. The following can serve as a guideline:
- Start by investing the source volumes. Try to find a way to divide the load in chunks. Assume that there is data from 2010 to 2021. What data is really needed when it comes to analysis? Assume that we find out that only data from 2016 to 2021 is of interest. If the data volume per year seems reasonable, a year based load is a good option.
- Now create necessary criteria definitions for the entity. This is done in the Entity Criteria - Incremental Load page. assume that we create only one criteria definition. For e.g. FACT_GL_TRANSACTION the column to use is ACCOUNTING_YEAR_KEY, the operator is "=" and the first conditional value is 2016. Make sure that the criteria is in state Ready.
- Go to the Entity Info - Incremental Load Detail page.
- Select the entity.
- Use the command Refresh Using Criteria.
- Choose the criteria in the drop-down and click OK to start the first transfer.
- When the refresh is ready, data for 2016 has been transferred. The Staleness will be Stale.
- Now continue with the other years. First modify the criteria definition and the run the refresh.
- When all years have been transferred, the initial data load is almost completed.
- Next we need to make sure that the Staleness of the entity is correct.
- For this purpose, in page Entity Criteria - Incremental Load create a new criteria definition that has a condition that will guarantee that 0 rows are transferred, e.g. year = 2040.
- In the Entity Info - Incremental Load Detail page, attach the criteria to the entity in the Refresh by Criteria group.
- Run the refresh from the detail page using the Refresh command.
- When done the Staleness normally goes to Fresh. It might also end up in Stale state if someone has added new transactions.
- Final actions
- Remove all criteria definitions that are not needed anymore.
- Remove that criteria attach to the entity in the Refresh by Criteria group.
- Create a criteria with a condition like year >= 2016 and attach it to the entity in the Refresh by Criteria group. This criteria makes sure that no transactions from years previous to 2016 are transferred.
- Next refresh will be an incremental refresh. It can be executed manually or as apart of a scheduled refresh.