Font Size: a A A

Design And Implementation Of The Dynamic Crawler System Based On State Transition

Posted on:2015-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:H W JiangFull Text:PDF
GTID:2298330422977153Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Web crawler, also known as web robots,it can automatically crawl Webinformation following certain rules.Web crawler is an important part of search enginetechnology.With the rise of Web2.0,Ajax is widely used in webdevelopment.Different from the traditional web,Ajax send request to the serverAsynchronously,and update the webpage according to the response from server.Ajaxreduces the load on the server greatly,also it improves the user experience.But at thesame time,due to the way Ajax update the html page,it also presents a challenge totraditional crawler technology.This paper analysed the principle of the traditional web crawler.Then designedand implemented a web crawler system which can crawl dynamic web data accordingto the question dynamic web crawler needs to resolve.The main work completed is asfollows.Firstly, on the basis of the previous model of dynamic Web crawler, combinationwith the theory of the graph structure, this paper proposed a dynamic web crawlermodel based on state transition, using the process of state transition to simulate thechange of web structure caused by web event.Then made some refinement to make itperform better in the real network environment, including web de-noising, new statede-duplication, and new state grabbing.Secondly, based on the model above, this paper designed and implemented acrawler system against dynamic web data in to ways, invoking the browser kernel and building native JavaScript analytical environment.At last, through experiments on real web crawling, this paper compared theadvantages and disadvantages of the two methods with traditional crawler.Andverified the feasibility and effectiveness of the system.
Keywords/Search Tags:Dynamic Web Page, Web Crawler, State Transition, Ajax
PDF Full Text Request
Related items