Font Size: a A A

The Research Of Malicious Web Pages Detection Based On Multiple Features

Posted on:2014-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:T YueFull Text:PDF
GTID:2268330425983781Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Webpage technologies are becoming more and more popular with the prevalence of Internet. Webpage users are threatened by various types of malicious webpages, particularly for phishing webpages, spamming webpages and malware webpages since they have their own characteristics. It is generally difficult for users to distinguish malicious webpages, thus current researches on malicious webpage detection and type recognition need to be further improved. Feature extraction methods of webpages are the key procedure for malicious webpages detection. This paper focuses on investigating and analyzing the feature extraction methods for malicious webpages, and proposes a new method for webpage feature extraction and has realized a system for detecting malicious webpages. Principle contributions of this paper include:This paper has discussed and analyzed existing feature extracting methods of webpages. Aiming at addressing the shortcomings, a feature extraction method for malicious wabpages detection based on webpage source codes and URL properties is proposed. The method uses the static analysis to extract the features of webpage codes and script information, and also gives an analysis on the URL to extract the text vocabulary features and the related host property features, and then represents these feathers in the form of numerical feature vectors. Comparative experiments using the proposed method and methods in existing literatures are conducted on the specific datasets, and general evaluations are made from the perspective of detecting the system accuracy.This paper has designed and realized a system for detecting malicious webpages based on the proposed feature extraction method. The system utilizes the webpage collecting block to gain the datasets of webpages. The feature extraction block utilizes the extracting method proposed in this paper to do feature extractions for the webpage dataset and then builds a webpage feature library. The data storage block is applied to store the related data of webpages into the disks. The detecting classification block introduces the k-nearest neighbors algorithm and SVM to do the detection, then utilizes the KD-tree algorithm to optimize KNN to reduce the timing overhead. Experimental is given to analyze performance of the system and the timing overhead for detection.
Keywords/Search Tags:Malicious Web Pages, Feature Extraction, URL properties, Detection
PDF Full Text Request
Related items