Font Size: a A A

Research On Key Technology For Query Optimization On Dirty Database

Posted on:2015-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:X TangFull Text:PDF
GTID:2298330422990878Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the development of the era of data, dirty data is more and morewidely exists in the database, the serious influence in the quality of the data, both toreduce the value of the data and reduce the efficiency of data management system,this brings new challenges to data management. Against these challenges, datacleaning and data restoration can solve this problem to a certain extent, but both ofthis two exists defects, data cleaning not only can not clean the dirty datacompletely but also exist the possibility of wrong cleaning the good data, datarestoration based on fixed rules, but for the complicated natural language processingwork, these fixed rules always hard to formulate that lead to the imperfectrestoration result, so only do the data cleaning and data restoration work can notsolve the dirty data problem completely.based on the above consideration, Over a long period of time in data qualityresearch, dirty data management system based on entity also arises at the historicmoment. The system can use the entity recognition technology to deal withrelational data in a database to the entity data, and then on the basis of the entitydata do work such as query operations. In view of the characteristics of the model inthe execution of a query operation such as selection or connection may produce a lotof intermediate results that can be expect not in the final results, if filter out theseuseless intermediate results in time that can improve query efficiency, it is differentfrom the characteristics of the relational data model, In this article, throughanalyzing the characteristics of the entity data model, query optimization work canbe divided into three main parts that named statistics of acquisition, the query costestimation model building and query plan selection algorithm, and for these threeparts, respectively, to the entity data model and relational data model, this paper putforward query optimization theory and method of query cost estimate applicable toentity data model and do experiments to analysis the efficiency of query planselection algorithm and Query cost estimate in multi-angle aspect.
Keywords/Search Tags:dirty data, cost estimate, query optimization, data filter, threshold
PDF Full Text Request
Related items