Font Size: a A A

Research And Implement Of Distributed Crawler System Supporting AJAX

Posted on:2014-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:B WuFull Text:PDF
GTID:2268330422963218Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, numerous internet products haveemerged. One of them, the AJAX technology, is getting more and more favors fromsoftware developers. However, this technology is unfriendly to the traditional web crawler,and the content of web pages grabbed by traditional web crawler is imperfect. Therefore,the research for crawler system supporting AJAX has great practical significance.This paper first investigates the current research situations of AJAX crawler, andanalyzes the advantages and disadvantages of existing crawler schemes. Then, we proposea solution which requesting web pages by invoking explorer. In order to improve theefficiency of crawling web-pages and coordinate resource allocation of AJAX crawler andstatic page crawler, this article also propose a solution of web page property classifier,which can feed back and correct classification results through text extraction result ofwebpage processing module. Finally, to maintain the normal operations of the distributedsystem, a monitor module of heart beats message is designed to monitor and analyze thehealth of the distributed system.The AJAX supported distributed crawler system studied and realized in this articlecan record both dynamic and static web pages, realize the efficient allocation of grabbingtask, thus providing a new solution to the crawler of AJAX web-page. The system testresults indicate that the anticipated functions of the proposed system can be implemented,and the performance of the proposed scheme is high.
Keywords/Search Tags:Distributed Crawler, AJAX, Dynamic Loading, Search Engine
PDF Full Text Request
Related items