Font Size: a A A

Research And Application Of WEB Anti-crawling Mechanism

Posted on:2018-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2348330518994470Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of WEB technology and diversity of application mode,people's lifestyle has changed greatly. An increasing number of people prefer to attend online courses or shop online, some even work at home instead of traveling to work. As the time of web 2.0 coming,World-Wide-Web has become the carrier of massive information, at the same time Internet crawlers increase gradually, which would cause harm to websites. So it's necessary to prevent crawlers and establish anti?crawling mechanism, which plays an important role in protecting visiting safety, websites content and users' private. Apart from these, it is also of great significance to data mining which is based on users' access.The major work of this paper is to describe the basic principles of web crawlers, analyze existing anti-crawler mechanism. And finally, we designed a real-time anti-crawling mechanism, which adopt RPC to separate anti-crawling detection and web service, so that it could make full use of the environment advantage of original web service and anti-crawling service.It is proved by experiments that this mechanism achieved a good result in anti-crawling and crawling recognition. There are distinct advantages than other mechanism in accuracy rate, covering ratio and composite indicator.
Keywords/Search Tags:Crawler, Anti-crawling, Sliding window, Classifier
PDF Full Text Request
Related items