Font Size: a A A

Deep Web Sources Classification And Query Interface Schema Extraction Based On Ontology

Posted on:2011-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:F LuoFull Text:PDF
GTID:2178330338476297Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The Internet can be classified into Deep Web and Surface Web by depth. Unlike the Surface Web providing link-based navigation, Deep Web can only be accessed by submitting a query to the form. Opposite to the Surface Web, whose data are mostly unstructured, most of the data of the Deep Web are structured, so the Deep Web is highlighted for special attention by research staff. Deep Web classification and Deep Web query interface extraction are the key technologies to obtain the Deep Web information. Based on the Ontology technology, this paper solves some problems of Deep Web information acquisition. The technology overcomes the limitations of traditional methods.First, we analyse the characteristics of Deep Web information. Basing on the knowledge of Hudong cyclopedia and CWB Chinese lexicon, we use Protege Ontology editor to build five domain Ontologies, which contain book domain, music domain, movie domain, digital products domain and real estate domain. In this paper, these ontologies support the research on Deep Web sources classification and Deep Web query interfaces schema extraction. Second, this paper describes an approach, based on the text of Deep Web query interfaces, to classify the Chinese Deep Web by domain. The approach uses the Vector Space Model. Based on Ontology, we structure feature for improving the classification accuracy. Finally, we research the query interface schema extraction method based on heuristic rule, and a new method based on Ontology is proposed. This method can make the computer understand the semantics of query interfaces. With the help of domain Ontology, the query interface schema is converted to an Ontology model. This paper extracts schema information from 200 query interfaces within five domains. Experimental results show that the precision and recall of this method are higher than those of the traditional methods which based on heuristic rules.
Keywords/Search Tags:Deep Web, Ontology, Query Interface, Sources Classification, Vector Space Model, Structure Feature, Schema Extraction
PDF Full Text Request
Related items