Font Size: a A A

Research And Implement Of Web Crawler Detection Based On Svm

Posted on:2011-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:T SongFull Text:PDF
GTID:2198330338982012Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of search engine technology, the web crawlers which offen unawared get information from internet has been a puzzle cann't be ignored by people. How to detect and discover these web crawlers to protect information safety and user privacy has been an important subject in network security.This paper proposed a crawler detection algorithm based on SVM and designed a WEB-CIS crawler inspection system. The WEB-CIS was tested and analyzed in the end. In this paper, the research content and innovation include the following:(1)This paper extract the feature vector which can represent the web access sessions by clustering analysis on web logs for the Web crawler's behavioral characteristics and proposed a method to calculation the feature vector LFCIS weight.(2) This paper analyzed the support vector machine classification algorithm, proposed an SVM-based web crawler detection algorithm and designed SVM classifier based on RBF kernel to classify crawler using web access session features. The testing results showed that this mothod is superior to other crawler detection algorithm.(3) This paper designed and implemented a WEB-CIS crawler detection system on the base of the support vector machine classification algorithm. In the paper, we described in detail the system architecture and module design including: Access Cluster module, Classifier Training module and Testing module.(4) This paper analyzed the evaluation criteria of web crawler detection system and tested the WEB-CIS system using these critieria. The ability in detecting web crawler was compared in WEB-CIS and several other web crawler detection system in a benchmark data set. Experimental results show that our WEB-CIS system is better than other crawler detection system.
Keywords/Search Tags:Crawler Detetion, Support Vector Machine, Classifier, Network Security
PDF Full Text Request
Related items