Font Size: a A A

Lineage tracing in data warehouses

Posted on:2003-09-11Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Cui, YingweiFull Text:PDF
GTID:2468390011989645Subject:Computer Science
Abstract/Summary:
Data warehousing systems collect data from multiple distributed data sources and store integrated and summarized information in local databases for efficient data analysis and mining. Sometimes, when analyzing data at a warehouse, it is useful to “drill down” and investigate the source data from which certain warehouse data was derived. For a given warehouse data item, identifying the exact set of source data items that produced the warehouse data item is termed the data lineage problem.; This thesis presents our research results on tracing data lineage in a warehousing environment: (1) Formal definitions of data lineage for data warehouses defined as relational materialized views over relational sources, and for warehouses defined using graphs of general data transformations. (2) Algorithms for lineage tracing, again considering both relational and transformational warehouses, along with a suite of optimization techniques. (3) Performance evaluations through simulations, and a lineage tracing prototype developed within the WHIPS (WareHousing Information Processing System) project at Stanford. (4) Applying data lineage techniques to obtain improved algorithms for the well-known database view update problem.
Keywords/Search Tags:Lineage, Data warehouses
Related items