Research On Android Malware Classification Method Based On Traffic Fingerprint

Posted on:2022-07-20

Degree:Master

Type:Thesis

Country:China

Candidate:J Deng

Full Text:PDF

GTID:2518306524489554

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

With the development of mobile terminals,smart phones have attracted a very large number of users because of their powerful functions.The Android system is welcomed by developers because of its open source and free features,and it occupies a large mar-ket share.But this also opens the door to Android malware,so it is necessary to research an effective malware detection method.There are not many studies on traffic analysis to detect malicious code.More common Android malware identification and classification methods are mostly based on static program analysis,which identifies and categorizes by analyzing the features such as API calls and permissions of Android software.This type of analysis method requires operations such as reverse engineering and decompilation of the software and is easily bypassed by codebased malware.This paper uses a dynamic anal-ysis method of traffic analysis to obtain the traffic generated during the running of the Android software,and uses machine learning and deep learning to identify and classify Android malware.This method has the advantages of high recognition and classification accuracy,flexibility and applicability,and resistance to static obfuscation based on the code level.The main work of this paper includes the following points:1.Choose a machine learning algorithm to build an effective traffic fingerprint detec-tion model,and the model is also suitable for encrypted traffic.We simulated two scenarios to distinguish benign traffic from malicious traffic and distinguished the types of malicious traffic.In order to better distinguish the two types of obfuscated traffic with high levels of confusion,Scareware and Adware,we have added an additional layer of obfuscation clas-sifier to help further classify malicious software.The framework mainly includes the ac-quisition of application communication traffic,the segmentation of traffic files in units of sessions or streams,preprocessing,feature engineering,and classification processes based on machine learning algorithms.In dealing with the problem of confusion classification,a confusion classifier is designed to form a multilevel classifier to improve the accuracy of classification.2.When constructing a deep learning detection framework,a method of removing thirdparty traffic is designed and introduced to improve the operating efficiency and de-tection accuracy of the model,and then segment the original traffic data by session,and convert it into a gray scale image that can represent the characteristics of the original data of the flow,and we use a two-dimensional matrix as the data structure of the gray scale image.In the field of classification,the CNN network can better learn the spatial structure information in the two-dimensional matrix,so CNN is used as the neural network model to autonomously acquire spatial features in traffic gray scale images.In addition,since the content of the session is essentially composed of a linearly arranged data packet sequence generated according to the time of the traffic interaction,the two-layer LSTM in the RNN is used to autonomously obtain the timing characteristics in the traffic,and finally the two trained deep learning models are used to identify and classify the malware to be detected.3.The experimental data used in this paper is the CICAndMal2017 data set collected in a real environment.The model is evaluated in two scenarios under the machine learning model.The experimental results show that the accuracy of the two classifications of malicious traffic and benign traffic reached 98.8%.There is also a 95.2%accuracy rate in the specific malicious traffic multi-classification;under the deep learning model of CNN and two-layer LSTM,after the third-party traffic is removed,the effectiveness of malware classification and identification on the samples in the test set is greatly improved.The accuracy rate of CNN has increased from 88.2%to 96.8%,while the increase effect of LSTM is more obvious because the accuracy rate of it has quickly increased from 89.2%to 98.3%.However,deep learning models usually require a lot of training data to get better results.In real life,many small sample problems will be encountered.At this time,machine learning is more suitable for such situations than deep learning.

Keywords/Search Tags:

Traffic analysis, Android malware, Machine learning, Deep learning

PDF Full Text Request

Related items

1	Using Deep Neural Networks For Android Malware Detection
2	Android Malware Family Classification Research And System Implementation Based On Network Traffic
3	Research On Android Malware Detection Algorithm And Application Based On Deep Learning
4	Detection And Classification Of Android Malware Based On Key Traffic Images
5	Research Of Android Malware Detection Method Based On Machine Learning
6	Research And Application Of Android Malware Detection Based On Deep Learning
7	Research On Malware Identification Of Android Based On Network Traffic
8	Research On Online Detection Of Malware Based On Network Traffic Behavior Analysis
9	Research And Design Of Android Malware Detection And Analysis System Based On Machine Learning
10	Resarch And Implementation Of Android Malware Detection System