Font Size: a A A

Research And Application On The Key Technology Of Data Pre-Processing Based On XML

Posted on:2011-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiuFull Text:PDF
GTID:2178330338985379Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, the cooperation between enterprises and departments urgently needs a high-performance data exchange program, to shield the difference between Heterogeneous systems, and realize data sharing. The existing technology have application in heterogeneous data exchange are many insufficiencies, such as complex procedure,heavy coupling and poor commonality etc. Additionally the heterogeneous system exist more data quality problem, in order to ensure the availability of data, the dirty data must be cleaned.The technology of data transfer and data cleaning can be used to resolve the above two problems. The paper aims at shortage of the existing schemes about data exchange, and design a solution of data preprocessing based on XML. The solutions can achieve bidirectional transfer between relational and XML data based on XML, and joined XML data cleaning function in the solutions. The schemes do the research for the following three aspects:(1)Study of a relational database transform to XML documents. Mainly research implement method of using E-R Model as mid-model will be converted relational database to XML. According to the associated relationship in the E-R Model, design a group of rule to convert entity integrity,referential integrity and user-defined integrity in relational schema into XML Schema.(2)Design a cleaning and detecting frame for XML outliers. This paper utilize the clustering analysis and the context information inherent in the XML data models, assemble the logic-related node in XML data to the same subspace, and according to these the related subspace compute a interest-ness measure XO-Measure of outliers and identifying outliers based on the XO-Measure, and design a group of related algorithm.(3)Study the rules of converting XML documents to relational database. Design a formalized algorithm to convert XML Schema into 9-tuples X_S, and then design a group of rules according to X_S which converted a group of components between relationships of components of XML Schema into the relation Schema.Experiments prove that this design not only has higher transfer efficiency, but also can better control the semantic losing in data exchange. In addition, outlier data cleaning method for XML can achieve higher accuracy and the recall level.
Keywords/Search Tags:Heterogeneous Data, XML Document, XML Schema, Data Transfer, Data Cleaning, Outliers
PDF Full Text Request
Related items