Metadata driven data extraction
Big data problem solved with a dynamic data extraction process driven by metadata.
The hospital system held a large amount of clinical and non-clinical data that required the development of pipelines to extract each table.
We reviewed the problem to see how we could cut down the amount of time and resource involved in bringing in new data or making alterations. We then designed a solution to that only required some basic metadata to be updated in order to bring in new data in small real-time transactions.
We also coded in checks between the source system and the data lake to ensure that the there would be auto correction or rebuild if data differences drifted above certain thresholds for each table. Data integrity was maintained by utlising existing primary keys in the source system to avoid data duplication.
This saved hundreds of hours in new and existing developments, freeing up time to develop data products that added more value.