Font Size: a A A

Focusing Technology Of Deep Web Data Source Based On Domain Ontology

Posted on:2012-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z X ZhangFull Text:PDF
GTID:2218330338973127Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays, the global is in the explosive development of information and knowledge era. So the Internet as an important carrier of information, its information capacity which is also rapidly expanding. At present, the information structure of the Web in accordance with the current distribution and location of its features can be divided into Surface Web and Deep Web in two parts. Surface Web, to compared with Deep Web can be more informative, information quality is better, theme and structural stronger more specificity and so on.Currently, people pay more and more attention to the integration of information about Web Deep research, and Deep Web Data Sources Focus technology is the most important prerequisite of the study. Based on the past study of Deep Web Information Integration and Deep Web data sources discovery technology, according to the features which are discriminated by Deep Web query interfaces page and query interface form, the extraction of low accuracy and lexical semantic information neglected areas of subject areas caused by Deep Web Query Interface form related issues missing, a Deep Web data sources focusing method based on the domain ontology is proposed. Ontology as one of the important semantic Web technology, its good concept structure and the support of logical reasoning,and the ability which express semantic through the relationship, can through the semantic level understand the theme content which contain Web pages and querying interface, improving the accuracy about Deep Web classification. This paper designs Deep Web focus crawler frame and Deep Web data query interface recognition and classification framework which based on domain ontology, through the collaboration of two frames together to achieve a extraction about field theme which is relevant to the Deep Web query interface page form.The main research work and innovation points are as follows:(1) This paper introduces the ontology knowledge and analysis the structure of domain ontology, according to the method of constructing domain ontology and combining the feature of tourism field which is related to Deep Web query interfaces form, constructs the tourism domain ontology which use OWL2 as the encoding language.(2) This paper shows the important role of information characteristics description which is illuminated by the field of knowledge domain ontology through the result of domestic and foreign researchers, combined with the analysis of Web focus of the crawler search mechanism to clarify the feasibility about introducing domain ontology management module to the Deep Web crawler,and guiding the advantages of crawler which can crawl on the theme Web page, a Deep Web focus crawler frame based on the domain ontology is proposed.(3) Based on the concept of ontology and semantic relations between the level of features, constructs the domain ontology management module,which provided the theme similarity calculation method for the total score through using the theme concept of ontology concept tree and the structure of semantics relations between context concepts. And use this as the basis of the Deep Web focused crawler which constructed eigenvector of the feature information about Web page.Thus to achieve the purpose of identifying the subject areas relevant to the form page accurately.(4) According to the PageRank algorithm, the related to the theme of the calculation has been improved and the feature selection module of URLs combined with Page theme relevancy and URL theme relevancy is provided.(5) This article presents a recognition method for Deep Web query interfaces,using this method to extract the Structure characteristics of <form> form in Web page. In addition, these characteristics are used to construct rule trees based on D-T algorithm, so the effective recognition of Deep Web inquires the interface form was realized.(6) On the basis of similarity analysis of attributive characteristics between inquires the interface and ontology,this paper combines with domain ontology management module and the module which is classified by Deep Web query interfaces, proposes a similarity calculation method for the total score of theme concepts and theme scene attribute in subtree of theme concepts.And use this as a basis of eigenvector which is constructed by a classification module of Deep Web query interfaces on the feature information of Web query interface. So as to achieve a purpose that can more refined identify the form of Deep Web query interface which is related to the areas of sub-topic.(7) Combining the management module of domain ontology,this paper classifies the topic twice which includes the page for focused crawler and the text features of query interface. The former focuses on the pages collection which is relevant to the breadth of the concepts about subject areas, and strive to filter non-relevant to the subject area of the page, try not to lose any area relevant to the subject page or the page which is pointed by the link of URL; while the latter focused on the collection of property characteristics about the depth of the interface towards to the areas subject concept. So even under the same domain, but belong to different sub-themes related to the query interface concept is clear classification as much as possible.Finally, this paper based on the experiments of every modules given the performance evaluation and analysis. And the experimental results show that this method makes the discrimination of Deep Web data sources and the accuracy of the topics effectively improved.
Keywords/Search Tags:Deep Web, domain ontology, theme description, theme similarity, discovery of data source
PDF Full Text Request
Related items