Font Size: a A A

Research On Deep Web Sources Classification Based On Ontology

Posted on:2012-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2178330335977723Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Internet can be classified into Deep Web and Surface Web by depth. With the rapid development of Internet, a large amount of information is increasingly generated and accumulated in our daily work and life. In order to make use of these resources, especially the Deep Web resources, academic interests introduce the research on Deep Web data integration. Deep Web data sources classification, as the important part in Deep Web data integration, needs further concentration and study.In this paper, the research is about the technology of the classification of Deep Web data sources. Ontology is applied to the classification of Web data sources. Algorithms and models are proposed. The main work includes:(1) Deep Web query interface model information extraction technology. Based on the page-form model, it is necessary to extract the features of the content text and hyperlinks, and regulate the feature extraction on the form at the same time.(2) In this research, Basced on the model which is proposed in this paper, we build several fields ontology using HowNet and WordNet. And new method of weighting is proposed.(3) In the Ontology-based classification method of Deep Web data sources, we propose ontology classification of the introduction of Deep Web data sources, based on query interface feature classification. HIFI is improved and new weighting is proposed. The formation of ontology-based classification algorithm Deep Web data sources comes true.Using the Weka, we do some experiments based on Bayesian, KNN, SVM and C4.5. By building ontology,the improved classifications lead to better than classifications based on query Interface features. The new weighting and new HIFI lead to better classification results.
Keywords/Search Tags:Deep Web, Ontology, Classification, Identify areas
PDF Full Text Request
Related items