Font Size: a A A

Design And Development Of Distributed Crawler Based On Scrapy Framework

Posted on:2018-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:H Y FanFull Text:PDF
GTID:2348330563952219Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,with the popularity of Weibo,many large public events are the first burst and quickly boarded the hot search list,causing all the people talk,so grasp Weibo on a variety of public information and carries on the analysis,dredge and guide and has great social significance.Based on the Scrapy framework,this paper studies,designs and develops a distri buted web crawler system to implement incremental crawling dynamic data from the current main social platform Sina Weibo.The specific research contents are as follow:(1)A dynamic data crawling method based on the Scrapy framework is studied and implemented to solve the dynamic data crawling problem of the Scrapy framework,which combining the PhantomJS with the Scrapy framework.At the same time,the Bloom filter principle is used in the Scrapy framework to reduce the memory,which is occupied by the crawler.(2)This paper analyzes the principle of asynchronous loading of Weibo data,and realizes a new Weibo crawler strategy to solve the problem that we can not obtain Weibo data completely.In addition,according to the characteristics of Weibo data update,this paper designed Weibo update strategy to achieve Sina Weibo fast and efficient updates to further improve the system performance.(3)Design and implement the system.Based on the Scrapy framework,,this paper designed the structure of the distributed crawler system,and developed the main functions of the system which include Weibo login function,Weibo page analysis function,data pipeline,and schedule.Finally this paper achieved a distributed crawler system which has multiple crawler nodes crawled in parallel.(4)The performance test,function test and run-time memory occupancy test are carried out for the distributed crawler system realized in this paper,and the test results show that not only can this system collect Weibo data stably and efficiently but also this system can reduce the amount of memory used by runtime.
Keywords/Search Tags:Scrapy, crawler, Weibo, Dynamic data crawling
PDF Full Text Request
Related items