Font Size: a A A

The Design And Implementation Of Anti-Crawler System At Dianping

Posted on:2016-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:C N ChenFull Text:PDF
GTID:2518304598956769Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Along with the development of the Internet application,the crawler technology is more and more popular.Malicious or poor design crawlers produce a large amount of negative issues,such as waste of server resources and reveal of private data.Anti-crawler technology turns out to be an important task for website managers.This paper first introduces anti-crawler system's background,the characteristics of crawlers and current research on crawler detection technology.Our final selection of the detection technology is based on feature analysis.Message bus architecture model,Spring MVC framework,Storm framework and the bloom filter technology are reviewed.This paper expounds the working principle and overall view of anti-crawler system,and introduces the overall planning,in turn,requirement analysis,to determine the system boundary,analyze the system's functional and non functional requirements,and illustrate the main processing.On this basis,the overall structure of the system,module partition,module design,module interaction,deployment design,system packages and database design are given.Finally,this paper analyzes the module design and implementation details respectively to the message bus,interception processing,admin management and real-time calculation module.This anti-crawler system can catch crawlers in real time and accurately,separate crawlers with good and bad purpose according to the characteristics of crawlers.In addition,it would also involve penalties to accelerate the identification of crawlers.At the same time,the system also provides a web page for administrator to set black and white list,config the rules and show the statistics of interception records,etc.At present,this anti-crawler system has been applied in the company and blocks a large amount of crawler requests,reaching tens of millions of daily level.It saves a lot of server resources for the company and protects the security of corporate data.
Keywords/Search Tags:Crawler, Interception, MessageBus, Spring, Spring MVC, Storm
PDF Full Text Request
Related items