CSV Files Import and data processing
Documents as a staging area to magage ETL status'
A staging area is an extremely useful function for ETL processes (extract-transform-load), and loading data from sFTP/file-system to update a target system with the information from each single line is a good example.
Use-Case
Your process reads one or more files from an sFTP which were dropped by your customer. Each file contains information about invoices and there is one column named Invoiceid
that represents a unique key for each invoice coming from the customer.

If the same invoice is contained more than once (or in different files), the invoice (the line) must be rejected. Rejected lines are shown in a service portal.
For performance and system utilization reasons this business process of Updating Invoices should be split into multiple technical processes, which run separately and asynchronously.
Process 1 - File processing
Load all files from sFTP
Set File status' to IMPORTING
Extract each line and set Invoice status to IMPORTED If Invoice already exists, set status to REJECTED
Pre-Validate each IMPORTED Invoice (Zip code, Street etc.)
Set Invoice status to READY, on success. Set Invoice status to ERROR, on failure, recoed the error message
Set File status' to SUCCESS, when all Invoices are READY Set File status' to ERROR, when any Invoice is not READY
At the end of this process we have imported all Invoices from all sFTP files into a Staging Area (some kind of a buffer or parking area, where the imported records reside for `some` time). Invoices have got the status IMPORTED, REJECTED or ERROR. The files on sFTP can be moved to a different directory (or deleted) - they are completely obsolete, because we tracked the import status.
Process 2 - Load Invoices into target system
The second process iterates through all READY Invoices in the Staging Area.
Service Portal - Clearing
It is important to note that the process does not care when/how an Invoice entered the Staging Area. This allows us to use a service portal which enables us to manage rejected or errored Invoices. If the service team corrects an Invoice it will automatically get the READY state and it will be picked up in the next run.
Once a READY Invoices has been loaded into the target system, it gets the COMPLETED state. In case of any error, for example, the target system "complained" about an unknown customer, the Invoice gets the ERROR status' with a note about the "unknown customer" so that clearing can resolve that issue.
Using Documents
Thinking about the use-cases it is quite obvious how Documents can help:
Each files' status is tracked in a
SFTP.Import.Invoice
documentEach Invoice is managed in a
STAGE.INVOICES
document that refers to its source (the file) using the File.Id as theContext
.

Last updated