Font Size: a A A

Research And Implement Of Data Provenance Tracing System Based On Ssis

Posted on:2011-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2198330338489845Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
ETL process is a common and mature integration technology, and there are manyETL tools. These tools only extract and transform the data from its source, then load theresults to the goal data, but they do not consider the audit\traceability of these data. As aresult, it brings many problems to data integration: first, the integrated data may beunreliable or even useless because of lack of auditing\tracing measures; second, it isdifficult to find the source for data integration problems, even for the domain experts;Finally, when common users find a mistake of some local data, they will doubt the otherdata that is processed by the same way. In this situation, because there are noauditing\tracing measures, users have to conduct the whole process again for all the data,and the cost of this work is too high, which makes the data integration out of meaning.Motivated by these challenges, we design and develop a SSIS based system whichcombines ETL and auditing\tracing functions, and users can trace the source of objectsof different layers in the target data to realize the traceable of ETL. The system providesauditing\tracing functions to ensure the quality of data integration.First, the key problems are studied in detail, which includes two parts: one is themetadata of the system, and the other is the decompositions and definitions of differentlayer objects in the transformation process and the algorithms about tracing. Metadataalso includes two parts. One is the metadata of transformation process which is the basicof tracing, and this metadata is a description file of transformation, which is also knownas transformation package. Based on SSIS, we redesign this package and analyze eachpart of it. The other part is the metadata of tracing, which includes serializing thepackages, designing globe object index caching of the object, and designing themetadata of inverse function library. The transformation process is divided into threelayers: transformation, mapping and operation. We classify each layer and design theinverse functions of the bottom layer, based on whcih we design inverse algorithm ofeach layer. According to the different layers of the tracing objects, we design thealgorithms of provenance tracing and presentation.Then, the structure of the overall framework is designed. Through comparingcurrect systems of different domains, use cases and activities are analyzed according tothe ETL functions. Based on these results, we design the framework, compartmentalizefunction modules of the system, and analyze the tracing layers and tracing flow.Finally, the system is realized. The system mainly includes two functions: ETLfunction and tracing function. Then the effect of the system is illustrated by an instance.
Keywords/Search Tags:Data Provenance, Tracing System, Transformation Package, InverseFunction, Metadata
PDF Full Text Request
Related items