| Machine learning has been widely used in malicious application detection in recent years.The static detection method based on machine learning can determine the malevolence of the program by statistical analysis of program features and the mapping model between program features and behavior.This method has the advantages of simple analysis method and high analysis efficiency.However,the current machine learning based static detection methods still have the following problems:(1)The detection accuracy of the classification method depends on the quality of the selected features.If the selected features cannot fully describe the behavior of the application,it will result in a high false positive rate for the detection method;(2)The method detection accuracy depend on the number of sample set.If the number of samples is small,the constructed model cannot fully reflect all the features of malicious applications,thus affecting the classification effect;(3)The problem of feature distribution homogeneity,that is the distribution of malicious program features in different sample sets is similar.In this case,even increasing the number of training samples can not improve the classification effect.To effectively solve the above problems,this paper proposes an application malware detection and family classification method based on generated adversarial samples.The main research contents are as follows:(1)Feature extraction method of Android application based on graph transformation.To effectively extract features in applications,this paper proposes a feature extraction method based on graph transformation.By parsing the application,the function call graph and control flow graph of the application are obtained,and then the two are reduced and combined to construct the interprocedural control flow graph of the callback function,and traverse it to obtain the sensitive API sequence as the application feature.(2)Application malware detection method based on generated adversarial samples.In order to increase the number of sample sets and predict new malicious samples,and then improve the classification effect of the classifier.In this paper,the genetic algorithm is used to evolve malicious samples to generate adversarial samples,and the adversarial samples are added to the original training set,and retrains the classifier to improve the detection rate of the classification algorithm.The experimental results show that the method can significantly improve the detection rate of classifiers trained by algorithms,such as decision trees.(3)Malicious family prediction method based on graph similarity.The family classification of malware can utilize the common features of malware in the same family to improve the detection effect of malware.In this paper,adjacency matrix is constructed for malicious application and malicious family respectively,and weight is calculated for each directed edge in the family.Adjacency matrix of family weight is constructed as family feature,and the family of malicious application is determined by calculating similarity coefficient between graphs.Experiments show that this method has a good prediction effect on most malicious families in the Drebin dataset. |