CSV Files Import and data processing
Documents as a staging area to magage ETL status'
Last updated
Documents as a staging area to magage ETL status'
Last updated
A staging area is an extremely useful function for ETL processes (extract-transform-load), and loading data from sFTP to update a target system with the information from each single line is a good example.
Your process reads one or more files from an SFTP which were dropped by your customer. Each file contains information about invoices and there is one column names Invoiceid
that cats as a unique key for each invoice coming from the customer.
If the same invoice is contained more than once (or in different files), the invoice (the line) must be rejected. Rejected lines are shown in a service portal.
For performance and system utilization reasons this business process of Updating Invoices should be split into multiple technical processes, which run separately and aynchronously.
Load all files from sFTP
Set File status' to IMPORTING
Extract each line and set Invoice status to IMPORTED If Invoice alread exists, set status to REJECTED
Pre-Validate each IMPORTED Invoice (Zip code, Street etc.)
Set Invoice status to READY, on success. Set Invoice status to ERROR, on failure, reco th error message
Set File status' to SUCCESS, when all Invoices are READY Set File status' to ERROR, when any Invoice is not READY
At the end of this process we have imported all Invoices from all sFTP files into a Staging Area (some kind of a buffer or parking area, where the imported records reside for `some` time). Invoices have got the status IMPORTED, REJECTED or ERROR. The files on sFTP can be moved to a different directory (or deleted) - they are completely obsolete, because we tracked the import status.
The second process iterates through all READY Invoices in the Staging Area.
It is important to note that the process does not care when/how an Invoice entered the Statging Area. This allows us to use a service portal which enables us to manage rejected or errored Invoices. If the service team corrects an Invoice it will automatically get the READY state and it will be picked up in the next run.
Once a READY Invoices has been loaded into the target system, it gets the COMPLETED state. In case of any error, for example, the target system "complained" about an unknown customer, the Invoice gets the ERROR status' with a note about the "unknown customer" so that clearing can resolve that issue.
Thinking about the use-cases it is quite obvious how Documents can help:
Each files' status is tracked in a SFTP.Import.Invoice
document
Each Invice is managed in a STAGE.INVOICES
document
that refers to its source (the file) using the File.Id as the Context
.