Font Size: a A A

Research Of Deep Web Data Source Classification Based On Frequent Pattern And Semantic Processing

Posted on:2011-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:H HuaFull Text:PDF
GTID:2178360305476534Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With an increasingly large size, network has become a huge library for information . But much of the information is "hidden" in online databases, therefore users have to submit queries through the query interface to obtain inside information, which is known as Deep Web .The nature of Deep Web is heterogeneous, large-scale and dynamic, which makes the search for suitable data become a great challenge. So there's an urgent need of a Deep Web information integration system. Deep Web data source classification is the key step in such system.This paper studies classification of Deep Web data sources, including the following research elements:(1) Introduce the background of the Deep Web and the research status in home and abroad. Propose the framework, the important content and positive significance of this paper.(2) Analyse the information extraction technology of query interfaces based on visual characteristics, and propose the form content and text extraction algorithm.(3) Introduce the idea of data mining under the situation of rich query interface resources. Use Apriori algorithm to find frequent patterns. Improve Bayesian classification model, exert the links between features to enhance the contribution of frequent pattern to the field division.(4) Extened the characteristics under the situation of sparse query interface resources. Establish a feature vector contains synonym sets through the external knowledge dictionary WordNet,increase the field division of the features. Use the improved KNN classification algorithm to set up a data source classification model.Select six areas of Deep Web data source query interfaces from the UIUC to build a data set. Then use 10-fold cross validation to verify the two proposed models, thus to show the better classification accuracy and value of the two proposed modules.
Keywords/Search Tags:Deep Web, Sources Classification, Data Mining, Frequent Pattern, Semantic Processing
PDF Full Text Request
Related items