Research Of Deep Web Data Source Classification Based On Frequent Pattern And Semantic Processing

Posted on:2011-02-09

Degree:Master

Type:Thesis

Country:China

Candidate:H Hua

Full Text:PDF

GTID:2178360305476534

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With an increasingly large size, network has become a huge library for information . But much of the information is "hidden" in online databases, therefore users have to submit queries through the query interface to obtain inside information, which is known as Deep Web .The nature of Deep Web is heterogeneous, large-scale and dynamic, which makes the search for suitable data become a great challenge. So there's an urgent need of a Deep Web information integration system. Deep Web data source classification is the key step in such system.This paper studies classification of Deep Web data sources, including the following research elements:(1) Introduce the background of the Deep Web and the research status in home and abroad. Propose the framework, the important content and positive significance of this paper.(2) Analyse the information extraction technology of query interfaces based on visual characteristics, and propose the form content and text extraction algorithm.(3) Introduce the idea of data mining under the situation of rich query interface resources. Use Apriori algorithm to find frequent patterns. Improve Bayesian classification model, exert the links between features to enhance the contribution of frequent pattern to the field division.(4) Extened the characteristics under the situation of sparse query interface resources. Establish a feature vector contains synonym sets through the external knowledge dictionary WordNet,increase the field division of the features. Use the improved KNN classification algorithm to set up a data source classification model.Select six areas of Deep Web data source query interfaces from the UIUC to build a data set. Then use 10-fold cross validation to verify the two proposed models, thus to show the better classification accuracy and value of the two proposed modules.

Keywords/Search Tags:

Deep Web, Sources Classification, Data Mining, Frequent Pattern, Semantic Processing

PDF Full Text Request

Related items

1	Research On Semantic Frequent Pattern Mining Algorithm Based On Trajectory Data
2	The Research And Relization Of Mining Frequent Patterns On Business Data Straems
3	A Study On Algorithms Of Weighted Frequent Pattern Mining
4	Text Classification Method Based On The Longest Closed Frequent Sequential Patterns
5	Research Of Frequent Pattern Mining Technology And Its Application In Real-time Signal Processing
6	The Research On The Related Problems Of Association Rule Mining
7	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
8	Research Of Mining Frequent Patterns And Classification On Data Straems
9	Research On Closed Pattern Based Data Mining Technologies
10	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System