Font Size: a A A

Imbalanced Learning And Its Application On Android Malware Detection

Posted on:2020-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y PangFull Text:PDF
GTID:2428330578967292Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of mobile applications and the explosive growth of the number of users,the security of mobile intelligent terminals is facing enormous challenges.Machine learning,as an important method in the field of artificial intelligence,has been widely used in mobile malware detection by analyzing network behavior in recent years.However,due to the imbalanced class distribution and the persistent arrival characteristics of network traffic,many difficulties and challenges are brought to the training of a machine learning model.Firstly,in real-world,the number of network traffic with benign behavior is far more than that with malicious behaviors.This kind of data set makes the traditional classification algorithms designed based on the assumption of the data set with balanced class distribution unable to achieve expected classification effects.Secondly,network traffic is constantly generated and the scale of data is becoming larger and larger,which brings great difficulties to the construction of machine learning models.Aiming at the above key problems encountered in the detection of Android malware based on network traffic,this study starts with the universal imbalanced learning methods,then goes on to the unique imbalanced learning method in the field of Android malware detection,and finally to the imbalanced learning method in the bigdata environment.From simplicity to complexity,this paper systematically carried out the following research work:(1)An oversampling algorithm based on adaptive weighting and Gaussian probability density function is proposed for imbalanced learning.By analyzing the counting and location factors,the algorithm assigns different weights to each minority instances.And then minority instances are synthesized according to the weights and Gaussian probability density function.This algorithm is validated by comparing with 7 existing data resampling methods on 37 public imbalanced data sets and is successfully applied to Android malware detection.(2)A data diversity measured and population-based incremental learning algorithm evolved imbalanced ensemble learning model is proposed.This model takes a instances diversity metric designed in this paper as the fitness function of the population-based incremental learning algorithm to generate a training subset cluster with maximum diversity.This model is tested on 44 imbalances data sets,and the results show that this model has significant advantages.In addition,this model has been successfully applied to the detection of Android malware.(3)In order to solve the class imbalanced problem in Android malware detection based on semantic information,a signature assisted random oversampling model is proposed.This model can synthesize malicious instances with all content features by using the signatures extracted from HTTP protocol streams.Compared with 11 data resampling methods on two network traffic data sets with different imbalanced ratio,this model shows obvious advantages.(4)An imbalanced method based on MapReduce distributed framework is improved to deal with the problems of large data scale and imbalanced distribution of network traffic data set.The model makes use of Spark distributed platform's broadcast mechanism to keep all the information of the minority class on each computing node,and adaptively find the best oversampling ratio.Applying this model,we can effectively construct Android detection model on large-scale imbalanced network traffic data sets.In summary,aiming at the problems of imbalanced distribution and large scale of network traffic,serval effective solutions are proposed in this paper.And the superiorities of those proposed methods are verified by a large number of experiments.The research works of this paper are significant to the theoretical research and practical applications on learning from large-scale imbalanced data sets in the mobile malware detection field.
Keywords/Search Tags:mobile malware detection, class imbalanced classification, resampling, bigdata, MapReduce
PDF Full Text Request
Related items