Font Size: a A A

Research On Key Issues Of Malware Detection Based On Data Mining

Posted on:2019-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:M NiFull Text:PDF
GTID:1368330575478840Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Computer and Internet technology,Internet has closely affected our work,dailylife.Meanwhile,Internet security is facing severe challenges.The explosive development and propagation of malware techiques has made the traditional client-based malware detection model unable to meet the growing needs of Internet security.Facing massive of unknown file samples,the traditional signature matching cannot be able to detect newly malwares efficiently.Therefore,it motivates anti-malware vendors and researchers to develop novel methods which are capable of protecting users against new threats.In recent years,many efforts with data mining techniques have been conducted on analyzing and classifying file samples,they mainly focus on static analysis,dynamic analysis and some other methods to extract the file content features,and based on these features,classification or clustering learning algorithms are employed to analyze and detect unknown malware samples.This thesis investigates employing deep learning algorithms,semi-supervised learning algorithm based on graph,social network analysis method as well as adversarial learning theory to design some novel malware detection methods,aimming at efficiently discover and detecting malware samples.The main work work of this thesis includes:(1)By analyzing the sequence of Windows API calls and convolutional neural network algorithm,a convolution neural network model based on Windows API calls is proposed to detect malicious software.N-gram method is employed to analyze the API calls sequence of file samples.The features of file samples generated by one-hot model is used as input to the convolution neural network.The input layer inputs the features of file sample,three different scale convolution kernel are designed to extract the local characteristics with differ-ent granularity in converlution layer,most important features are sampled by max pooling,finally,the full connection layer employ Softmax regression function for malicious software detection.The experiment results show that CNN model can improve the performance of malware detection compared with baselines.(2)We study how to use file relations for malware detection.The associations between file samples are used to compute the file similarities to construct file-to-file relation graph by kNN method.Label Propagation algorithm is applied for classifying file samples based on the constructed file relation graph.We use a real and large dataset consisting of file co-occurrence records from users' clients.Comprehensive experiments are performed to compare our proposed method with other existing malware detection approaches.The experimental results demonstrate that the proposed method outperforms other malware detection methods using data mining techniques.(3)We present a framework FindMal for effective malicious software detection based on the associations and social network of file samples,including File Sample Collector,Graph Construction,Graph-based Feature Extractor,Label Propation.Inspired by the ideas from Twitter spammers detection,we investigate three graph-based features of the file-to-file social graph for evaluating the importance of each vertex.Based on the importance of file samples,top k important files are sampled for querying labels from security experts to improve the classifier.Label Propagation algorithm is applied for classifying file samples based on the constructed file-to-file relation graph.Based on that,a batch mode active learning method is applied to improve the detection rate of classifier.In instances selection,representative instances are sampled by the maximum batch network gain which takes both representativeness and diversity into consideration.A real and large dataset consisting of file co-occurrence records collected from users' devices are used to evaluate the performance of this framework.Comprehensive experiments are performed to compare proposed method with other existing malware detection approaches.The experimental results demonstrate that FindMal outperforms other malware detection methods using data mining techniques.(4)We investigate the safty and robustness of malware detection methods in adversarial environment.The attack mode and attack scenario of adversarial learning are discussed.We propose a case study in one of the attack scenarios to evaluate the safty and performance of decision tree algorithm in malware detection.Then,the safty of robustness of multiple classifier system under attack of adversary is investigated.Based on the analysis of the random forest algorithm,and we proposed a extended feature space random forest algorithm to counter the attack in adversarial environment,and its performance and robustness are evaluated.
Keywords/Search Tags:Data Ming, Malware Detection, File Sample, Neural Network, Label Propa-gation, Social Network Analysis, Adversarial Learning
PDF Full Text Request
Related items