Font Size: a A A

Based On Anomaly Detection Technology Anti Crawler System Design And Application

Posted on:2017-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:H W RenFull Text:PDF
GTID:2308330485460757Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, big data technology has developed rapidly all over the world. Many countries have big data promoted to national strategic level.Big data is the oil in twenty-first century. Open and sharing can add the value of the data. Using of the data, in order to maximize its value. With the increasing popularity of internet applications, open and sharing are facing many security issues. Malicious crawler is one of most important security issues.There are many spider tools on the internet that can be downloaded freely. A hacker use these spider tools, through one web page or data interface to traver all pages, then get a large amount of data. A large number of data leakage caused serious security problems to the internet companies.The original intention of the share data is to let the normal users use, then generate new data, and generate new value by analyzing the new data.The crawler malicious access, bring a lot of cost to the information service provider, including machine and network. At the same time, a large number of data leakage from the effective control of internet companies. The data being illegall used, produced many new problems, and the deviation from the original intention of sharing data.In this thesis, we design and deploy data collection point, configure defense rules and detection algorithms based on the characteristics of internet data service. Based on large data analysis system, the use of flume to achieve data collection, the use of kafka to achieve data integration, deployment of agent for fast and efficient data collection. To accomplish the statistical data of multi-dimension, and to detect the abnormal behavior of the sequence data based on statistical data. The use of anomaly detection technology, to find out the site of data leakage. According to the frequency of access to find out the source of the crawler and its soure ip. Building a anti-crawler system, to intercept into the processing flow of the business system, receiving the crawler source IP found by real-time analysis system. Intercept and detect the behavior of each access, and block access from the crawler.Using real time computing technology of large data, data collection and statistics of the operation of the website, real-time output data report to accurate perception of the operating status of the web site. Use of operating data to optimize the site, improve system availability, and can predict the normal growth of the site’s access, so as to expand the server and bandwidth, the exception of access to the amount of rapid identification, and to block all access to malicious IP, reduce resource consumption, while avoiding the risk of data leakage, ensure the well operation of the site.
Keywords/Search Tags:Big data analysis, Anti crawler, Anomaly detection
PDF Full Text Request
Related items