Font Size: a A A

Research And Implementation Of Crawler Detection Based On Hidden Markov Model

Posted on:2019-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:M B XieFull Text:PDF
GTID:2428330566487286Subject:Engineering
Abstract/Summary:PDF Full Text Request
People can use crawlers to crawl on-demand and save data.However,due to a large number of web crawlers existing,the isue is obvious.For example,crawlers occupy a large number of Internet bandwidth,resulting in slow speed to normal users,or even unavailable to website for users.The purpose of this design is to obtain the crawler range based on the crawler data set,so as to realize the distinction between ordinary users and crawler.The main process is as follows:(1)According to the characteristics of crawler visits and site types,we developed 48 crawler data sets covering a wide range of areas.(2)We extract the sequence of time series related to the data set of the crawler,and then train the corresponding hidden Markov model.Based on the Frequency distribution of the observation's average log likelihood,we estimate the distribution range of the crawler.(3)In the detection phase,we calculate the average log-likelihood distribution of every sequence to be examined,then compare it with the crawler interval to detect crawlers.The innovation of this research is to analyze the characteristics of the existing crawler and establish a data set covering a wide range of crawlers.Based on this data set,a more accurate crawler model can be trained,which will help improve the recognition rate of crawlers.The data set can also help researchers conduct crawlers detection work,which is conducive to the development of crawler detection.
Keywords/Search Tags:crawler detection, dataset, Hidden Markov Model, crawler range
PDF Full Text Request
Related items