Font Size: a A A

Research And Application Of Multi-step Outlier Detection On Android

Posted on:2020-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2428330611950031Subject:Software engineering
Abstract/Summary:PDF Full Text Request
New types of malware and malware variants are emerging.On the one hand,the traditional detection model relies too much on known samples to detect new malicious behaviors.On the other hand,in the actual detection task,there is a phenomenon that it is difficult to obtain malicious data and the acquisition cost is too high.Faced with the problems of incompleteness,obtainment difficulties,and high cost of data collected by malware,the outlier detection method is currently an effective method.The Android system software outlier detection method refers to the detection process of behavior prediction for the online application.Although software anomaly detection is currently an indispensable and important method for detecting the security of unknown software,software outlier detection still faces high-dimensional data.For example,outlier detection is difficult to operate effectively,has high real-time and accuracy requirements,and it's detection samples are not balanced problem.Based on the fact that data flow information is the main judgment basis and resource for software privacy leaks,the thesis mainly focuses on the following issues:(1)Feature extraction for outlier detection.The features of outlier detection are difficult to analyze and manifest in the detection of high-dimensional outlier points,the real-time nature of outlier detection,the accuracy of outlier detection,and the correlation between abnormal software.To this end,this thesis defines the problem of software outlier detection firstly,and then performs feature transformation and parameter setting on the feature extraction model and uses the features of machine learning with good generalization ability to transform the feature representation of the sample in the original space to a new feature Space to improve classification accuracy,thus constructing an automatic and efficient software feature extraction method.(2)The sample balance of outlier detection.The problem of unbalanced sample detection in outlier detection is often due to the lack of labeled samples and imbalanced sample classification in the initial stage of new malware or malware variants,which leads to the problem of unbalanced training set samples and too few labeled samples.In this thesis,through semi-supervised learning method,the use of a large number of unlabeled software samples themselves can assist without excessive human participation in the detection of abnormal points according to the existing labeled samples and unlabeled samples from the talents.(3)Multi-step outlier detection model.Based on the superiority of featuretransformation and the automatic classification of clustering algorithms and with reference to machine learning neighborhoods,which has good processing capabilities for large-scale data,this paper combines with existing resources to conduct problem modeling and design outlier detection schemes and also proposes multi-steps outlier detection model.The model takes the application data execution path related information as the entry point,selects features according to the importance of the information,provides clustering basis and anomaly metrics for anomaly samples,and forms a data flow information set representing software anomalies as a classification criterion.Experimental results show that the model can effectively reduce the number of data flow features of outlier detection on the data set,improve the accuracy of anomaly software identification and ensure the security of the software used.(4)Multi-step outlier detection prototype system.On the basis of Android Studio development tools and FLOWDROID data flow analysis tools,this thesis designs and implements each functional module of the prototype system,using samples of different nature to verify the effectiveness of the multi-step outlier detection method proposed in this paper.In this thesis,a multi-step outlier detection method is tested in the form of a comparative experiment.The experiments are performed using two data sets,Google Play Store and VirusShare database,and the comparison between MUDFLOW and MULTIFLOW based on principal component analysis(PCA,principal component analysis)Anomaly detection model,the data set is tested and the results are analyzed.Compared with the published method MUDFLOW detection model,the detection rate is increased from 78.16% to 91.37%,and the true case rate is increased from 79.59% to 92.76%.Compared with the traditional anomaly detection method,the proposed multi-step anomaly detection method has a higher performance improvement,which verifies the efficiency and correctness of the multi-step outlier detection method.
Keywords/Search Tags:Outlier detection, Malware, PCA, K-means, ?-SVM, Prototype system
PDF Full Text Request
Related items