Font Size: a A A

Desigh And Implementation Of Web Anti-crawling System

Posted on:2016-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:H D TangFull Text:PDF
GTID:2308330479491519Subject:Software engineering
Abstract/Summary:PDF Full Text Request
A company is the world’s most popular Chinese travel platform. The Air Ticket Searching and Trade Platform is one of it’s important basic systems. The platform’s search scope covers over 180,000 flight routes, 4,000 online travel agencies, while its 2014 annual ticket transactions have exceeded 80 million. With the business growing steadily, the platform and related business systems are under much more pressure which comes from various external web spiders. A large number of web crawlers brought a series of serious problems: First, the data security issues, critical data at risk of being acquired by competitors while facing abnormal crawl access; Second, the system performance problems, a large number of requests resulting in depletion of server resources, seriously affecting the user experience; Third, different business systems spent considerable duplication of effort on mixed quality anti-crawling system, forming a waste of resources.In this paper, through in-depth research on the crawl related technology, design and implement web anti-crawling system(ACS). ACS provides a uniform, high-quality anti-crawling service for the platform and related business systems, implements HTTP header strategy, Java Script encryption strategy, IP blacklist strategy and access frequency control strategy. Through in-depth understanding of the business of tickets trading platform, ACS implementes the business logic rel ated behavior patterns strategy, raising the cost of crawling required. In addition, ACS system obtains a good scalability and low coupling, because of the good interface design. ACS provides a solution for problems described above.It can be sure that the five straties mentioned in the paper have been implemented correctly and the initial functional requirements have been meeted after a certain functional testing and performance testing of the whole Anti-crawling system. ACS system obtains a low coupling with other business systems and can be easily accessed. Meanwhile, ACS appears stable and outstanding performance in the performance tests. Currently, ACS has been put into practical use.
Keywords/Search Tags:Anti-crawling, web spiders, Ticket Searching and Trade Platform, data security
PDF Full Text Request
Related items