Font Size: a A A

Android Malware Detection Based On Semantic Attributes

Posted on:2020-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:K R RenFull Text:PDF
GTID:2428330599454708Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Today,Android malware has become a serious threat to our daily digital life.With the increasing number and variety of malware,traditional analysis methods become inefficient or even ineffective.Therefore,there is an urgent need for efficient and accurate detection methods to resist the malware in order to protect the security of Android users.In recent years,many researchers have proposed machine learning-based methods.However,these methods usually use lightweight syntax attributes which are too simple to characterize the Android applications,and thus it is not enough for the detection of Android malware.While the semantic attributes can represent the behavior of the application well,so in this paper we uses the semantic attributes,namely control flow graphs and data flow graphs,to characterize Android applications.Our work consists of the following two parts:First,we propose an Android malware detection approach based on control flow graph and data flow graph.Firstly,we extract the control flow graphs and data flow graphs of the Android applications through static program analysis,wherein both intra-procedural analysis and inter-procedural analysis are considered.Then,we encode the flow graphs into matrices and propose for matrices two combination modes,that is,the horizontal mode and the vertical mode.Finally,we build the Android malware detection model based on the convolutional neural network.In order to verify the validity of our approach,we have carried out a series of experiments on nearly 100,000 Android applications from the dataset Marvin,Drebin,VirusShare and ContagioDump.The experimental results show that our approach is quite effective in malware detection and the horizontal combination mode has the best F1-score,which can reach 98.722%.Moreover,we also compare with some existing approaches or tools,including DODroid,CSBD,Drebin,and the detection tools in the VirusTotal websites.The experimental results show that F1-score of our approach is higher than other approaches.Second,we propose an Android malware family classification approach and a family behavior mining approach.For malware family classification,we still use control flow graphsand data flow graphs as features to characterize the Android malware.Our classification approach is quite similar to our detection approach,but differs in that our classification approach also considers the permission information required by the invoke instructions.To evaluate the validity of our classification approach,we perform our approach on a dataset consisting of 20 malware families and 10675 Android malware samples.The experimental results show that our approach is capable of classifying malware with the accuracy over 92%,and performs better than Drebin and most tools in VirusTotal.For family behavior mining,we extract sequences of operation instructions from the graphs of malware,and then filter the sequences with the weights of the family classification models.Finally,we use the text analysis algorithm TF-IDF to analyze the sequences of the malware samples in the same family,yielding the feature vectors to represent malware families.And we select the top-x features from the vectors as the representive behaviors of malware families.Moreover,to evaluate the validity of the feature vector,we perform experiments to use the feature vectors to classify the malware samples based on Euclidean distance.The experimental results show that the top-x sequences of feature vectors can represent the behaviors of malware families,when the top-0.01% to top-0.02% sequences of control flow graph and the top-0.05% to top-0.07% sequences of data flow graph are taken,the accuracy increases rapidly,and then the growth rate slows down.
Keywords/Search Tags:Android Malware, Malware Detection, Malware Family Classification, Control Flow Graph, Data Flow Graph
PDF Full Text Request
Related items