CSV Files Import and data processing

Documents as a staging area to magage ETL status'

A staging area is an extremely useful function for ETL processes (extract-transform-load), and loading data from sFTP/file-system to update a target system with the information from each single line is a good example.

Use-Case

Your process reads one or more files from an sFTP which were dropped by your customer. Each file contains information about invoices and there is one column named Invoiceid that represents a unique key for each invoice coming from the customer.

Example: Invoices in a CSV file on sFTP

If the same invoice is contained more than once (or in different files), the invoice (the line) must be rejected. Rejected lines are shown in a service portal.

For performance and system utilization reasons this business process of Updating Invoices should be split into multiple technical processes, which run separately and asynchronously.

Process 1 - File processing

  1. Load all files from sFTP

    1. Set File status' to IMPORTING

  2. Extract each line and set Invoice status to IMPORTED If Invoice already exists, set status to REJECTED

  3. Pre-Validate each IMPORTED Invoice (Zip code, Street etc.)

  4. Set Invoice status to READY, on success. Set Invoice status to ERROR, on failure, recoed the error message

  5. Set File status' to SUCCESS, when all Invoices are READY Set File status' to ERROR, when any Invoice is not READY

At the end of this process we have imported all Invoices from all sFTP files into a Staging Area (some kind of a buffer or parking area, where the imported records reside for `some` time). Invoices have got the status IMPORTED, REJECTED or ERROR. The files on sFTP can be moved to a different directory (or deleted) - they are completely obsolete, because we tracked the import status.

Process 2 - Load Invoices into target system

The second process iterates through all READY Invoices in the Staging Area.

Service Portal - Clearing

It is important to note that the process does not care when/how an Invoice entered the Staging Area. This allows us to use a service portal which enables us to manage rejected or errored Invoices. If the service team corrects an Invoice it will automatically get the READY state and it will be picked up in the next run.

Once a READY Invoices has been loaded into the target system, it gets the COMPLETED state. In case of any error, for example, the target system "complained" about an unknown customer, the Invoice gets the ERROR status' with a note about the "unknown customer" so that clearing can resolve that issue.

Using Documents

Thinking about the use-cases it is quite obvious how Documents can help:

  • Each files' status is tracked in a SFTP.Import.Invoice document

  • Each Invoice is managed in a STAGE.INVOICES document that refers to its source (the file) using the File.Id as the Context.

The Documents (Statging Area) after file processing (process 1)

Last updated