Xml-based The Origin Calculated And Origin Storage Research

With the development of science and technology, XML data has become the new criterion of data representation and data exchange on the Web. XML data is semi-structured data. Due to its scalability, platform independence, openness, etc, XML data has been widely used in many fields, such as scientific computation, electronic commerce and data integration.However, the rise of Web technology changes the flowing way of data. Data replication, processing and conversion have become increasingly frequent. During the data flowing, data quality is difficult to control, which makes it become difficult to identify data reliability. Data provenance is to describe the data origin, and all evolution processes over time. Data provenance is important to data management, especially in scientific data and high-quality Web data management. Therefore, tracking and managing XML data provenance is a subject with great research value and application prospects, moreover it is a useful exploration of improving the Web data quality.This work is investigated from three parties, including provenance model, provenance calculation and provenance storage. First, we analyze existing provenance model and the problems. Aiming at the characteristics of XML tree structure, a novel annotation structure is introduced, and the more general provenance model is proposed. Secondly, base on provenance model, a set of generalized provenance calculus rule and related concepts are defined, and the related properties of provenance calculus are presented. These properties and calculus rules can realize the provenance expression for the query results, especially for Where-and How-provenance models. Finally, the efficient provenance storage issue is investigated, using the characteristics of provenance information, this work proposes the provenance storage reductions. And experiments show that the proposed storage technologies in this paper have a better scalability compared with existing storage technologies.
