Research Of Deep Web Crawler Supporting Ajax

Posted on:2011-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:R F Guo

Full Text:PDF

GTID:2178360305476560

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

There is a lot of valuable information in Deep Web, and the amount of information is growing rapidly. As the development of web 2.0, more and more Deep Web sites applying the technology of Ajax to improve the experience of users. However, Ajax technology can interact with the server asynchronously, and refresh-free dynamic changes in page content. Those features of Ajax make search engines face enormous challenges when crawling the pages. Since the traditional search engines do not have the ability to handle Ajax, they face difficulties in crawling Deep Web data. Thus, the coverage of the information is reduced. In addition, with the extensive research and applications of the technology of Ajax, a new generation of Ajax-based web information extraction is growing more and more important. Therefore, the research on how to obtain the information from the Deep Web sites that applied the Ajax technology is the starting point of this paper.The main work and results are as follows:(1) First of all, the scale and structure of the resources of Deep Web on domestic and international was investigated. The studies show that the Deep Web sites, which use Ajax technology contains a wealth of information resources. However, the researches of the Deep Web applied Ajax technology are very few.(2) On the basis of the structure of the Deep Web crawler, the difficulties of the Deep Web crawler that support Ajax is analysis, such as the identification of the Ajax query interface, the submission of the Ajax form, and the crawl page of Ajax. Based on this, a framework of Deep Web crawl that support Ajax is established.(3) According to the characteristic of the Ajax query interface, it can be divided into three types. Then, the recognition method and processing model is given respectively. Finally, the submission of the Ajax form is completed. (4) The data-region recognition model of the results pages is established, on the similarity of the DOM trees and sub-trees. The auto-discovery page navigation mode is achieved on the basis of the data-region. In addition, the Ajax page navigation mode is further discussed, and the query results of the Deep Web sites are obtained finally.A lot of experiments are carried out to verify the theories and methods, which is proposed in this dissertation. At the last part of this paper, some problems that need further research are put forward. Furthermore, we look forward the direction and prospects of this field.

Keywords/Search Tags:

Deep Web, crawl, Ajax, Query interface

PDF Full Text Request

Related items

1	Research On The Key Technology Of Ajax Depth Information Acquisition And Clustering
2	Research On Subject-Based Incremental Parallel Crawling
3	Research On Key Technologies Of Deep Web Data Crawling
4	Research On Issues For Uncertainty Of Query In Deep Web
5	Researches On Deep Web Query Interface Determining Technology
6	Research On The Domain-oriented Deep Web Query Interface Discovery
7	Research On Key Technology Of Deep Web Information Integration
8	Research Of Query Interface Integration Mechanism In Dwiis System
9	Research Of Query Interface Integration Mechanism In DWIIS System
10	Study And Design Of A Deep Web Query Interface Modeling System Based On FCA