Font Size: a A A

Research Of Deep Web Crawler Supporting Ajax

Posted on:2011-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:R F GuoFull Text:PDF
GTID:2178360305476560Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
There is a lot of valuable information in Deep Web, and the amount of information is growing rapidly. As the development of web 2.0, more and more Deep Web sites applying the technology of Ajax to improve the experience of users. However, Ajax technology can interact with the server asynchronously, and refresh-free dynamic changes in page content. Those features of Ajax make search engines face enormous challenges when crawling the pages. Since the traditional search engines do not have the ability to handle Ajax, they face difficulties in crawling Deep Web data. Thus, the coverage of the information is reduced. In addition, with the extensive research and applications of the technology of Ajax, a new generation of Ajax-based web information extraction is growing more and more important. Therefore, the research on how to obtain the information from the Deep Web sites that applied the Ajax technology is the starting point of this paper.The main work and results are as follows:(1) First of all, the scale and structure of the resources of Deep Web on domestic and international was investigated. The studies show that the Deep Web sites, which use Ajax technology contains a wealth of information resources. However, the researches of the Deep Web applied Ajax technology are very few.(2) On the basis of the structure of the Deep Web crawler, the difficulties of the Deep Web crawler that support Ajax is analysis, such as the identification of the Ajax query interface, the submission of the Ajax form, and the crawl page of Ajax. Based on this, a framework of Deep Web crawl that support Ajax is established.(3) According to the characteristic of the Ajax query interface, it can be divided into three types. Then, the recognition method and processing model is given respectively. Finally, the submission of the Ajax form is completed. (4) The data-region recognition model of the results pages is established, on the similarity of the DOM trees and sub-trees. The auto-discovery page navigation mode is achieved on the basis of the data-region. In addition, the Ajax page navigation mode is further discussed, and the query results of the Deep Web sites are obtained finally.A lot of experiments are carried out to verify the theories and methods, which is proposed in this dissertation. At the last part of this paper, some problems that need further research are put forward. Furthermore, we look forward the direction and prospects of this field.
Keywords/Search Tags:Deep Web, crawl, Ajax, Query interface
PDF Full Text Request
Related items