Font Size: a A A

Research Of Web Robot Detection Based On Behavior Pattern

Posted on:2018-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:X K JuFull Text:PDF
GTID:2348330512466967Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Web Robot(Web Crawler)is a program that can automatically access all kinds of Internet resources,since 1993 has been formally applied to ordinary users and professional Internet practitioners have brought convenience.With the advent of Web Robots,people have the ability to perform targeted searches in the growing Internet data.The Internet technology continues to develop,has been fully integrated into all aspects of society,the Internet is also increasing the amount of data,in order to meet people's different needs,Web crawler technology is constantly updated.Generally speaking,it can be divided into General Robot,Focused Robot,Incremental Robot,Deep Robot,Topic Robot and Distributed Robot.In practice,large-scale network crawler systems tend to integrate several technologies to achieve common,making its architecture and behavior has become increasingly complex.However,it is widely used in the retrieval of network information and resources,it also has some hidden dangers and negative effects.Web Robot will frequently try to obtain all kinds of resources on the Web site,which will affect the performance of the Web server and the risk of information disclosure.Secondly,crawler access to the Website will affect the Website log,which will affect some Website log data The difficulty and accuracy of digging.In addition,some Robot programs designed for malicious purposes(such as snooping Web site vulnerabilities or stealing site information)will cause privacy data leakage,resource abuse and other issues.In order to solve these problems,Internet workers have developed a number of Web Robot detection technology,making the site developers to detect whether the client is an ordinary user or Robot program.In order to further improve the detection effect of Web Robot and make up for the shortcomings of existing detection methods,this thesis uses session vector to describe the behavior of Web Robot,and implements a detection algorithm based on Web Robot behavior characteristics.The main contents are as follows: Introduce the principle and analysis method of behavior vector and its application in various fields by analyzing the design principles and behavior patterns of Web Robot,and introduce the advantages and disadvantages of other detection algorithms in details.The design is based on the support vector The validity of the algorithm is analyzed and the test is completed in the experiment.The innovation of the thesis is that: according to the behavior characteristics of the Web crawler,the Web logs are clustered and the feature vectors that can mark the Web access session are extracted and improved.The method of calculating the weight of the feature vectors and the improved Weight formula.Based on the SVM-based crawler detection algorithm,a crawler detection system based on behavior pattern is designed and implemented,and its system architecture and module design are described in detail.
Keywords/Search Tags:Web Robot detection, behavior pattern, support vector machine
PDF Full Text Request
Related items