Font Size: a A A

Research And Implementation Of Web Crawler Detection Method Based On Behavioral Features

Posted on:2022-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y QuFull Text:PDF
GTID:2518306341450614Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the growth of web crawler traffic and the increasingly complex malicious web crawlers'behavior,effective detection and classification of web crawlers to ensure network security becomes a necessary issue to be addressed.Most of the existing web crawler detection researches were based on complex structured websites.But institutions such as colleges and government apparatus need to provide hundreds of simple structured websites with limited functions.The existing web crawler detection methods are not suitable for these types of sites.Because navigational patterns of crawlers resemble humans for most structures of a site will be covered during a visit.Furthermore,because the sessions are so short on these types of sites,there will not be enough content visited to distinguish humans and crawlers.To solve this problem,a web crawler detection method based on behavior features is proposed.Given the perspective that website services provided by institutions can be divided into multiple functional types,which can then be analyzed through the continuity and volatility of user visits between multiple types of sites,avoiding analysis of a single simple structured site.Furthermore,a web crawler detection system is designed and implemented based on the proposed method.The work of this thesis consists of two parts:Firstly,a web crawler detection method based on multiple types of web is considered.Features that capture characteristics of multiple types of web are extracted,such as multi-web risk degree,multi-web request times and flows volatility.Then an improved rule-based hierarchical detection method based on K-means is designed to divide data set into a normal user set and a web crawler set and to classify the web crawler set by the CART classification algorithm.Secondly,a web crawler detection system based on behavioral features is designed and implemented based on the proposed method,including four parts:data layer,feature extraction layer,crawler detection layer and display layer.The system discovers and classifies web crawlers through the access logs uploaded by users,and can evaluate and optimize the detection results using the uploaded labeled data.Results demonstrate that the proposed method is effective,obtaining accuracy of 97.35%.
Keywords/Search Tags:web crawler detection, URL, k-means, CART
PDF Full Text Request
Related items