Font Size: a A A

Theories And Approach Of Data Lineage Tracing In Data Warehouse Environment

Posted on:2003-08-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:C F DaiFull Text:PDF
GTID:1118360092498854Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The exact history of a given warehouse data item, including the complete description of its acquisition , transformation and integration is termed the data lineage. Data lineage includes two parts: (1) the set of source data items which exactly produces the warehouse data item; (2) the processes which contribute to the set of source data items. Identifying the data lineage of a given warehouse data item is termed data lineage tracing . As one of the most advanced research problems in data warehouse system, data lineage tracing may play an important role in the area of in-depth data analysis, and help us to validate the source data , cleaning rules and transformation rules, and thus improving the quality of data warehouse.Beginning with the formal definition of derivation set, this thesis finds the general laws of derivation set, proves the theorems about derivation set, proposes an approach for weak inversion and verification based on attribute mapping to trace data lineage, gives a series of arithmetic for data lineage tracing, describes the basic processes of data lineage, and then forms systematic theories and approach. Following is the primary work and contributions of this thesis.First, the concepts about data lineage tracing are completed and refined, and the formal definition of derivation set and supplementary set are provided. These definitions form the basis for derivation set tracing. At the same time, they are the criterion for verifying the result of tracing. Then this thesis proves five theorems about derivation set, which defined the relationship between transformation and attribute mapping, derivation set and attribute mapping, derivation set and contribution set, and the correlation of supplementary set of transformation. These theorems is the basis and guideline for constructing and verifying the weak derivation set according to the invertibilrty of attribute mapping, thus improves the basic theories of data lineage tracing.Next, this thesis presents a data lineage tracing approach, Wivem ( Weak Inversion and VErification of attRibute mappiNg ), which can calculate ( attribute-level ) derivation set of attribute mapping. Then, this thesis analyzes the invertibilrty of transformation, and presents the formal definition of weak invertibte transformation, and calculates ( tuple-level ) derivation set of transformation by one-dimension merging and multi-dimension merging of the weak derivation set resolved by weak inverse attribute mapping. Also, this thesis proves the uniqueness and solution theorems of derivation set of basic relation operators.Then , This thesis presents the formal definition of derivation set of transformation diagram, proves the derivation set transitivity theorem, and shows the basic processes for tracing transformation diagram. Upon the construction of weak derivation set, this thesis presents the concept of continuing traceability , and provides decision algorithmfor the continuing traceability of a transformation sequence and tittering algorithm for the continuing traceable weak inverse attribute mapping. Upon verifying weak derivation set, this thesis gives a series of verification algorithms based on the best property of attribute mapping or transformation.Finally, in order to validate our theories and approach, this thesis conducts data lineage tracing experiment with relational query Q2 and Q12 of TPC Benchmark?H, and compares the tracing performance with the approach of tracing query process presented by Doctor Cui. The result shows that the Wivem approach is much better than the approach presented by Cui according to tracing time, storage cost and the precision of tracing result.
Keywords/Search Tags:Data Warehouse, Data Lineage, Derivation Set, Contribute Set, Transformation, Attribute Mapping, Weak Inverse Attribute Mapping
PDF Full Text Request
Related items