Research On Deep Web Data Sources Classification Based On Semantic

Posted on:2013-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:W P Liu

Full Text:PDF

GTID:2248330395455457

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet technology, there are a lot of Webdatabases, which become a huge information resource database and provide vastamount of information to people. According to the―depth‖of information stored inWeb, the entire Web can be divided into two categories: surface Web and Deep Web.The quality of information in Deep Web is better than that in Surface Web and thequantity of the information in Deep Web is more than that in Surface Web, moreover,the information in Deep Web has more significant application value. Since the DeepWeb data is of the dynamic, hidden, distributed and heterogeneous characteristics,which make the integration of Deep Web data interfaces face a great challenge.Therefore, how to classify the Deep Web data source fast and efficiently is a key issueto be addressed and has important practical significance and broad applicationprospects.This thesis focuses on a series of key technologies of data source classification.We propose a novel classification model based on Semantic Tree, an Adaptive KNNalgorithm based on density and a weighted Naive Bayesian algorithm, respectively, allof which can effectively improve the classification accuracy. The main contributions ofthis work are as follows:1. The feature extraction of query interface page is a basis of Deep Web datasource classification. A new effective query interface page feature extraction method isproposed based on the page-form model. Finally an information gain based featureselection method is used to select features.2. Due to the heterogeneous characteristics of the Deep Web data sources, thesame feature of different Deep Web interfaces may be represented by synonymous orpolysemous words, and thus it lacks of unique semantic understanding. To address theabove limitations, a novel classification model based on Semantic Tree is proposed.3. In order to address the limitations of the canonical KNN algorithm and NaiveBayesian algorithms, an adaptive KNN algorithm based on density and a weightedNaive Bayesian algorithm are proposed, respectively.4. Finally, experiments are performed on real UIUC Web repository dataset. The co mparative analysis of t he experi ment al results show t hat Se mantic Tree model andt he i mproved classification algorit hms proposed in this paper are effective.

Keywords/Search Tags:

Deep Web Data source classification, Semantic tree, ImprovingK-NN algorithm, Weighted nave bayesian algorithm

PDF Full Text Request

Related items

1	Research On Weighted Naive Bayesian Classification Algorithm Based On Rough Set Theory
2	The Research Of Algorithm Of Bayesian Networks Used In Data Mining
3	Research On Deep Web’s Data Source Automatically Identify And Classification
4	Research Of The Classification Algorithm Based On Nonparametric Bayesian
5	Improveing Based On Naive Bayesian Classifier Algorithm
6	Research On Algorithm For Relational Data Classification Based On Background Knowledge
7	Research Of Deep Web Data Source Classification Based On Frequent Pattern And Semantic Processing
8	Automatic Classification And Identification Of Deep Web Data Source Using Multi-classifier
9	A Novel Multi-grouped Graph Bayesian Classi?cation Model
10	Research On SAR Image Classification Algorithm Based On Bayesian Sparse Representation Theory