Font Size: a A A

Research On Malware Identification Of Android Based On Network Traffic

Posted on:2019-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:S S WangFull Text:PDF
GTID:2428330545969218Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,the rapid spread of the mobile Internet and the emergence of various APPs have brought great convenience to people's lives.However,this also fosters the continuous development of malware.The scale and variety of malware are constantly increasing,which poses a huge danger to users' privacy and property security.The detection methods for Android malware can be roughly divided into static code analysis and dynamic behavior analysis,as well as traffic behavior analysis.Static code analysis detectes malware by identifying malicious code segments.This kind of method is simple and effective,but it fails when faces with shelled APPs and code confused APPs.Dynamic behavior analysis monitors events and behaviors performed by APPs.Due to the high complexity,it is difficult to be applied and deployed on a large scale.Malware identification methods based on network traffic capture and analyze user's network traffic data.This kind of method is easy to implement and does not rely on users or consume users' resources,so it is relatively promising.However,malware identification methods based on network traffic also face some problems.The most prominent problems are the difficulty of feature selection and low recognition accuracy.This paper analyzes a large amount of network traffic data and summarizes characteristics of multiple groups of network traffic.Firstly,it discovers six statistical characteristics of TCP flows,and performs preprocessing operations on each feature to develop a TCP flow-statistics feature set.This data set does not involve traffic content,so it can protect user privacy and effectively detect encrypted network traffic data.Secondly,the HTTP request field feature set is created.Four fields are filtered from the HTTP request header and preprocessed respectively.A feature set of HTTP request field is developed.According to the HTTP header information,all the head information is segmented and we obtain their N-Gram sequence.The features are automatically selected and finally an HTTP header N-Gram feature set is developed.As for the URL string,a valid URL feature set is created using string segmentation and word vector training.These four feature sets cover multiple layers of network traffic such as TCP traffic,HTTP traffic,URL strings and so on.This paper combines different feature sets with different machine learning algorithms and deep learning algorithms to develop several malware detection models.Utilizing TCP flow-statistics feature set,HTTP request field feature set with decision tree algorithm to create two complementary detection models.An effective malware detection model is created using the HTTP header N-Gram feature set and support vector machine algorithm.Another effective malware detection model is built for URL feature set with multi-view neural networks.Combining URL string set with the floating centroid method,an effective malware detection and clustering model is created.For each detection model,multiple groups of evaluation and comparison experiments have been designed to verify the effectiveness of the model.This paper also applies the models to real environment to detect unknown APPs.The experimental results also confirm the validity of feature sets and models.In addition,in order to facilitate the management of APPs and network traffic,this paper designs and implements an Android APPs and network traffic management platform.By calling the detection interface of VirusTotal,large-scale APPs can be uploaded and detected by multiple anti-virus scanners.For network traffic data generated by Android APPs,this paper integrates several commonly traffic processing operations in the system,such as TCP flow extraction,HTTP traffic extraction,DNS traffic extraction,URL string extraction,and so on.In addition,the system also integrates a variety of machine learning algorithms and deep learning algorithms.Users only need to select the algorithm to achieve data modeling automatically.Users can also choose a variety of algorithms simultaneously.The comparison results of different algorithms can be presented to users in a visual manner.
Keywords/Search Tags:Android malware, feature set, machine learning
PDF Full Text Request
Related items