Font Size: a A A

Design And Implementation Of An Ajax-supported DEEP WEB Crawlershanghai Jiao Tong University

Posted on:2011-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:C Q ZhangFull Text:PDF
GTID:2178360308952652Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the network resources are getting more and more abundant, meantime how to extract information from network also become essential, especially collecting the information of Deep Web should be focused on. Deep Web refers to the information which can be gained through Internet, but cannot or does not be indexed by general search engine because of the limit in technology.In order to construct more dynamic and more sensitive responsed Web application, realize the asynchronous parallel running between browser and server, Ajax arises at the historic moment. Ajax technology is widely used nowadays, undoubtedly, Ajax can well promote response and interaction of the network application, but it increases information of Deep Web.This paper designs and implements an Ajax-Supported Deep Web Crawler, this Crawler can distinguish clickable elements in DOM, execute the client-side JavaScript, and then form a state flow diagram, in order to present each state and navigation path in Ajax application. According to the state flow diagram, compared with the original Ajax application, one multi-pages version will be generated, and at the same time a sitemap will be generated to make those static pages be indexed by general search engine. This Crawler aims at exposing the important parts of Ajax websites to the search engine to promote coverage and accuracy rate of searching. At the end of this paper Ajax-Supported Deep Web Crawler and another crawler named JSpider are used in an experiment. The experiment results prove that Ajax-Supported Deep Web Crawler can catch more amount of information,two proportions are respectively 1.12 and 2.93.
Keywords/Search Tags:Ajax, Deep Web, State flow diagram, Ajax-Supported Deep Web Crawler
PDF Full Text Request
Related items