Font Size: a A A

The Research And Application Of Data Preprocessing In XML Data Warehouse

Posted on:2008-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y MengFull Text:PDF
GTID:2178360242472253Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, data Warehouse is used more widely, data Warehouse data sources include heterogeneous operable external databases and other external data. Data Sources include noise, vacancies data, repeated data and inconsistent data which are impact on Analysis of data. Therefore, how to preprocess data to improve the data quality is very important on accurateness of data analysis results.Focusing on data preprocessing and its key technology, the thesis introduces research background and the current state of development of data warehouse and data mining as a application environment of data preprocessing, and analyses the advantages and disadvantages of the existing algorithm of data cleaning and data conversion, proposes a improved XSLT Transformation method to implement the conversion of relational data. By cleaning the conversed XML data, especially the similar repeated XML data, a method which using tree edit distance algorithm based on filtering length and upper, lower limits is proposed to detect the similar repeated data.To ensure that the data warehouse will not lose data accidentally, and standardize metadata management, the thesis proposes the theory and method of data cache and data pretreatment model of metadata warehouse, applies the improved algorithm to the improved data preprocessing model, and designs a data preprocessing model integrated with the conversion device, clearing device, Metadata extraction, and data cache. Using the preprocessing model, the "dirty data" can be processed to become clean, unity, integrated.Finally, according to the requirements of the decision-making analyzing, the thesis applies the data preprocessing model to the XML data warehouse in Live Force Maneuver, and increases the data quality in data warehouse.
Keywords/Search Tags:Data Warehouse, Data Preparation, Data Caching, Storage Metadata, XML, Data Cleaning, Data Conversion
PDF Full Text Request
Related items