Research On The Key Technologies About Preprocessing Of Deep Web Integrated Query System

Posted on:2013-06-16

Degree:Master

Type:Thesis

Country:China

Candidate:C L Zhang

Full Text:PDF

GTID:2248330371972581

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the development of information technology, people are increasingly inclined to obtain resources from the network. The resources that can be retrieved by traditional search engine are called Surface Web, which only accounted for a small fraction of the whole web resources. The resources hidden in Web database, which only be obtained by submitting a query form to generate dynamic pages are known as Deep Web. Deep Web contains a large number of specialized information, so how to access to these resources efficiently has become the key issue of current research.Deep Web Integrated Query System is a global query system which integrate different query interfaces in the same field. We can get resources from different Web databases by submitting query form in this global interface. Preprocessing is the first stage in the process of system integration, it mainly contains three steps:the discovery of the Web interface, query interface schema extraction and query interface integration. Its final result has a great impact on the next stage of query processing and result processing. Therefore, finding efficient methods in every step of preprocessing stage is the starting point of this article. The main research works of this paper are as follows:(1) Analysising of the characteristics of the Deep Web query form, studing and comparing the advantages and disadvantages of current technology of the discovery of the Web interface. This paper proposes the strategy of selection of the seed URL for the focused crawling technology based on multiple classifiers, improves the form classification and uses the algorithm base on decision tree to distinguish the query form that is non-Web interface.(2) This paper studies the schema feature of query interface and proposes the schema extraction method based on DOM tree and DWI object model according to structural features of the HTML page. First, the interface page is parsed into a DOM tree structure through a web parser, then traverse the DOM tree to find the attribute element and its corresponding label. Last, make DWI object model express the schema information of query interface.(3) This paper proposes a schema matching method based on semantic model according to the characteristics of attribute element of query interface. The method gives similarity formula to attributes from simple matching and complexity matching,which has more effective results.In order to test and verify the efficiency of related technologies for pre-processing stage, this paper designs specific experiments, which results show these methods are feasible.

Keywords/Search Tags:

Deep Web, Web Interface Discovery, Schema Extraction, Schema Matching

PDF Full Text Request

Related items

1	Research On Method Of Deep Web Schema Matching Based On Query Interface
2	Research On Key Technologies Of Deep Web Data Integration
3	Research And Application On Technology Of Deep Web Schema Acquisition
4	Research On Deep Web Query Interface Discovery And Pattern Extraction
5	Research On The Deep Web Interface Schema Matching Based On The Machine Learning
6	Research On Technology Of Deep Web Schema Matching
7	Research Into Query Interface Schema Extraction Of Deep Web
8	A Dissertation Submitted To Graduate School Of Southwest University
9	Research On Technology Of Schema Matching Between Global Schema And Local Schema
10	Research On Schema Extraction From Deep Web Query Interface