Data Lake Service¶

Data Lake Service is a REST API hosted within the IFS.ai Platform. The Service runs as a Managed Identity on the IFS.ai Platform and that Managed Identity has access to the relevant Data Lake.

Data Lake Service will provide the following actions which are facilitated via it's endpoints.

1.Upload or Download the specified files from/into Cloud Storage - currently supporting only the Azure Data Lake Storage.

2.Add, Update and Get metadata-related details in a Cloud Storage - currently supporting only the Azure Data Lake Storage.

3.List Down the storage hierarchy according to a given container and for a given path as well within a Cloud Storage - currently supporting only the Azure Data Lake.

How the Service works (Generic):¶

To enable an API endpoint, a Token is required.

Token request¶

Data Lake Service/ Data Pipeline Service can be used based on the relevant Azure Tenant ID, obtained via the Token.

The service user requires a Token from the IFS.ai Platform to obtain services via the Data Lake Service.
The subject ID needs to be included in the Token request as a Token claim.
The token will be validated by the IFS.ai Platform and once the token is received, it will be passed to the Data Lake Service.
First, the Data Lake Service will verify the Token and then this Token will be passed to the Tenant Information Service in the IFS.ai Platform.
Based on the Tenant information, Data Lake Service will retrieve the relevant Azure Tenant ID.

After the Azure Tenant ID is received,

This Azure Tenant ID will be used to get the relevant Data Lake storage and container information from the IFS.ai Platform.
Based on the storage and container information, Data Lake Service directs the files to the relevant location (file Upload scenario).
During the Upload, it is possible to add a Metadata Dictionary. The size limit for the metadata is max 2Kb.
Metadata attributes can be set on the blobs so that vectors in Azure AI Search can be enriched with metadata attributes during the indexing step of the documents.

Example Use cases:¶

Data Pump is one of the containers that uses the provided endpoints from the service to perform the necessary action.
ESG uses Data Lake Service to Upload external files via the file storage (interface in IFS Cloud Web which will call the Data Lake Service) and also read information back into IFS Cloud (I.e. snapshot creation based on KPI calculations)
Copilot uses Data Lake Service to Upload PDF docs to the required Data Lake storage (documents for Copilot) via the file storage that needed to be indexed (interface in IFS Cloud Web). Once the documents are uploaded, Azure AI search will trigger an index in the docs.