Font Size: a A A

The Research And Implementation Of Deep Web Query Results Extraction

Posted on:2016-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhangFull Text:PDF
GTID:2308330479989189Subject:Computer technology
Abstract/Summary:PDF Full Text Request
For the huge scale and higher quality of information, Deep Web becomes the hot spot of research. Deep Web data integration as a direction of the research on Deep Web data, is of important significance to improve the efficiency of the query. Deep Web data integration mainly contains three modules: the query interface integration module, the query processing module and the query results processing module. This paper is mainly on the research of method and technology that used in query result processing module.This paper introduces the background and developments of Deep Web data integration firstly,then describes the framework of information integrated and existing Deep Web information extraction technology, focusing on the research and implement of the method of query results extracting and integrating. The main contents of this paper are:(1) Study the query submission method on different Deep Web. For different Deep Web,construct the mapping of attributes between global interface and local interface. When there is a query request happened in the global interface, then submit the query request to each local interface through the mapping.(2) Study the method of Deep Web query result extracting. Transform the query results web page into a DOM tree, then after cleaning the tree, compute the similarity of sub-tree and compare the number of father node’s sub-tree that has potential useful date areas corresponding to the DOM tree, and finally locate the results data areas that users needed.(3) Study the method of integrating the query results of multiple Deep Web. Constructing the area synonym table, then with reference to the matching rules that developed by the visual features of results web pages, annotate the semantic of records’ data items. After annotating the semantic of records’ data items, remove the duplicate data records, finally,output the results in unified pattern.On the basis of these studies, this paper designs experiments to validate the methods, and the experimental results show that the methods are of feasibility and higher performance. The query system has the extensibility, for different areas, it support users to add or delete Deep Web sources, then implementing the extraction and integration of the Deep Web query results.
Keywords/Search Tags:Deep Web, Query Results, Extraction, integration, DOM
PDF Full Text Request
Related items