Font Size: a A A

Multi Feature Detection For Android Malware Based On Machine Learning

Posted on:2018-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2428330596954619Subject:Mathematics
Abstract/Summary:PDF Full Text Request
The accurate and efficient Android malware detection technology is not only the user's urgent need for their own security,but also a prerequisite for the development of Android.The traditional static analysis method has low accuracy,and can not deal with malicious programs that have been confused by code.Dynamic analysis method is unsuitable for a large number of malware detection due to its complex and time-consuming detection process.At present,the detection technology of machine learning still exists several shortcomings,such as redundant features are dealt insufficiently,algorithm used is single and some of the features can not well reflect the malicious behavior.In order to overcome the shortcomings mentioned above,this paper proposes a multi feature detection scheme based on machine learning.Firstly,Baksmali.jar and strace tools are used to extract the function call and system call,which prepare data for the classifier.The function call is directly related to the function of the application,and can be viewed in the classes.dex file;system call as underlying information of Linux,becomes a bridge connecting applications and system resources.Application need interact with these two features during the implementation of each function.Therefore,these two features can better reflect the behavior of the program execution and overcome the impact of code obfuscation technology.Secondly,this paper uses the improved chi-square statistics method named CHI-IDF to select features.It can prevent some important features with the small sample size from being ignored,and avoid some features of little significance with the large sample size from being overvalued.In this method,not only the data which has little influence on the classification can be eliminated,but also the weight of each feature can be balanced.Thirdly,in order to overcome the problem that the accuracy and generalization ability is not well for the single algorithm,classifiers are constructed by using Naive Bias,KNN and SVM algorithm.In the use of KNN classification,in order to reduce the influence of the large attribute value on the distance,this paper maps the attribute value to [0,1] interval.When it is hard for the classifier to make classification ofsome samples,Naive Bayesian can give an optimal "Guess-results" and its corresponding probability estimation.SVM algorithm is regarded as the best "Ready-made " classifier,that can be used without being modified.The above three algorithms can promote each other when combined together.Finally,10-fold cross-validation is used on the WEKA platform to conduct the ladder experiment on the 1000 malicious programs and the 1000 normal programs.The number of samples in each experiment was selected 400,800,1200,1600 and2000.By studying the classification effect of the gradient samples,we can analyze the classification trend of the classifier,and it is more helpful to predict the result when these classifiers detect the large number of samples.In order to further verify the feasibility of the detection scheme,this paper uses the Androguard tool to carry on the contrast experiment under the same sample set.The experimental results show that the proposed scheme performs better in terms of time efficiency and accuracy.
Keywords/Search Tags:Function call, System call, Android malicious programs, Machine learning, Chi-square statistics
PDF Full Text Request
Related items