Load Data Source¶
Loading a Data Source based on the defined Data Lake Destination¶
A Parquet Data Source can be loaded into the relevant destination Data Lake (Self-Hosted Data Lake, IFS.ai Platform Data Lake). Loading copies the data from the Oracle database to ADLS Gen 2 by creating .parquet files in the defined folders. The action can be triggered via an Analysis Model, Workload Job Definition, or based on an Explicit trigger to load a Parquet Data Source.
The Parquet Data Source destination Data Lake can be defined during the creation of a New Data Source. Else it will be determined based on the case. This is referred to as the 'Effective Destination' as described below.
How it works:
Context | How the Effective Destination is being determined |
---|---|
Workload Job Definition | If the Workload Job Definition has a destination, then the Effective Destination is the destination of the Workload Job Definition. If the Workload Job Definition has no destination, and if it is a system-defined Workload Job Definition, the Effective Destination is IFS.ai Platform, else it will be the Self-Hosted Data Lake. |
Analysis Model | The Effective Destination is always Self-Hosted. |
Parquet Data Source | If the Data Source has a destination, the Effective Destination would be the destination of the Data Source, else it will be the Self-Hosted Data Lake. |
Once the Effective Destination is determined, the requirement to load the Data Source is considered as below.
-
If the particular Data Source has no Last Load History, the relevant Data Source will be loaded into the Effective Destination as determined above.
-
If the Data Source has a Last Load History, and if that Last Load History destination includes the Effective Destination, then the Max Age will be used to determine if the Data Source is outdated and gets reloaded accordingly.
- Else, If the Data Source has a Last Load History and if that Last Load History destination does not include the Effective Destination, it will reload the Data Source.
Explicit Load¶
A Parquet Data Source can be loaded explicitly as well. When loading is initiated via IFS Cloud Web using the Load option, Explicit Load flag will be set to true. A Parquet Data Source load job will be triggered the next time the scheduler starts, regardless of the Max-Age or usage of the Parquet Data Source in any model /Workload Job Definition. The effective destination data lake will be determined as described in the above section.
Explicit load will do a full load for Dimensions and an Incremental Load for Facts if applicable.
Loading a Parquet Data Source during creation:
Once all the columns are selected, Loading can be initiated in two approaches:
- Select Yes during the Load Data Source prompt and exit the assistant.
-
Exit New Data Source assistant and Load later
This method requires selecting the required data source(s) which need to be loaded from the Parquet Data Source page and then hitting the Load button.
Also, an Explicit Load can be performed on any existing Parquet Data Source. (which has a Refresh History based on Analysis Model refresh or Workload Run).
The Explicit Load sets the Explicit Load flag to Yes. When loading is in progress, the Parquet Data Source status will be transitioned as below.
Read more about the Refreshing schedule.
Parquet Data Source Status Transition
Parquet Data Sources Starting status | Transitioned status | Context |
---|---|---|
Detecting Changes | Finished Detecting Changes | Start detecting changes for Incremental data sources and the Scheduler pod will add jobs to detect changes in the queue. The data pump will pick up the job run the detect changes query and mark applicable partitions to be loaded. |
Detecting Changes | Error | Error while detecting changes for Incremental data sources. |
Finished Detecting Changes | Job Queued | The Scheduler pod will add jobs to load data into the data lake. |
Job Queued | Loading | The Data pump pods pick up the jobs from the queue and start processing the jobs. It reads the data from the Fact and Dimension tables and transforms it into a Parquet file. |
Loading | Success |
Loading jobs started. The parquet file will be uploaded to ADLS Gen 2. The Data pump pod marks the job completed by changing the status to Success and refresh info in the Parquet Data Source load history. |
Loading | Error | Error while Loading. The parquet file will not be uploaded to ADLS Gen 2. The Data pump pod marks the job status to Error in the Parquet Data Source Load History. |