Font Size: a A A

Query Interface Pattern Matching Considerations In Heterogeneous Web Database Integration

Posted on:2012-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q DengFull Text:PDF
GTID:2248330371473629Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the computer technology evolution, especially the Internet isdeveloping on its high way, more and more information become shares, a great dealof Web databases which have contained huge valuable information on various fieldshave already became the main information resource for people to go with all overthe world. But the space complexity of these data resources led to extreme diversityof the Web databases. The description forms of data depicting the same thing haveconsiderable heterogeneous characteristics. It is a significant research issue thathow to integrate these heterogeneous databases to one relationship database, whichoffers users the uniform interface and makes the difference description formstransparence to them.Traditional heterogeneous Web databases integration methods are based onMediator-Wrapper framework using Xquery as the communal query language, allthe services are based on that all the global data are expressed by XML or metadataform. This paper pays attention to the features of heterogeneous Web databases thatthe amount of information is huge and the speed of updating is fast. A deep studyhas been spread out combining with classical decision tree algorithm for thedestination that the query interface pattern matching in mass heterogeneous Webdatabases integration which contains noise can be solved. The main achievementsare as follows:1. An overview and analysis of basic theory system for existedMediator-Wrapper integration framework and the main methods for query interfacepattern matching are presented.2.It concluded the theory of traditional decision tree models systematicallyand some optimization strategies and a deep analysis for these classical decisiontree algorithms are also given.3.For the fact that Web data always contains noise, based on the existeddecision tree algorithms, a mix decision tree learning model based on suspiciousinstances impact analyze is represented which named MDSII. It resolves theproblem that Mediator-Wrapper framework is excess depended on XML/metadataexpression well by choosing division attribute by the information gain rate function,analyzing the impact of suspicious instances for global data, determining thematching modes, and improves the ability for antagonizing noise Obviously. 4.The drawbacks in solving classification problems in huge data integrationfor traditional decision trees are contain slowly generation process, depending onfield knowledge, over fitting and so on. Against these problems, a independentclassification algorithm in data integration based on PDN trends is proposed namedPDNtrends, judging pre-pruning chance in contributing decision tree process withobserving data calculation result, this method make the classification beindependent of the field knowledge, deceasing the scale of tree and makes the rulesmore comprehensible, in the same time, the classification accuracy is prevented andthe tree contribution efficiency are improved.5.Based on the research above, a prototype system is carried out, certificatingthe correctness and effectiveness of the heterogeneous Web databases integrationmethod represented in this paper.
Keywords/Search Tags:heterogeneous Web databases, machine learning, decision tree, XML, modes matching
PDF Full Text Request
Related items