Research On Imbalanced Mobile Malicious Application Traffic Identification Method

Posted on:2019-11-04

Degree:Master

Type:Thesis

Country:China

Candidate:L L Wang

Full Text:PDF

GTID:2428330545969226

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of mobile networks,the number of smart phones and tablets based intelligent terminal has grown exponentially.Smart phones bring a lot of convenience to people's lives.However,it exists some security problems.In recent years,malicious applications have become increasingly rampant,which brings great harm to users and society,as well as pose new challenges to network security and management.Great mass of malicious applications perform malicious behavior through the network.Therefore,analyzing malicious traffic generated by mobile applications has become a hot topic in the security field.Recently,the traffic recognition technology based on machine learning has matured,which makes it possible to accurately identify malicious traffic from the perspective of machine learning and network technology.However,several key issues are still being resolved for applying machine learning techniques to study effective malicious traffic identification methods.(1)Difficulty of malicious traffic feature extraction.With the development of technology,the recognition rate of malicious traffic based on traditional features can no longer meet actual needs.(2)Packet sampling problem.In a high-speed network environment,it is pretty difficult to collect and process a complete flow as the speed of the network is getting faster and faster.And the development of packet sampling technology provides a new idea for traffic identification and reduces the burden on the computer.(3)Classification on imbalanced traffic.From the viewpoint of traffic distribution on the Internet,normal traffic is much higher than malicious traffic.Adopting standard classification algorithms directly is more trend to identify normal traffic accurately and the performance of the classifiers fail to meet people's expectation.To solve the above the issues,this paper will carry out research work from the following aspects.Firstly,in order to address the feature extraction and evaluation of malicious traffic,we extract features from the data packet level and content level respectively,and constructs an effective malicious traffic identification model based on machine learning algorithms.Secondly,we adopts packet sampling technology in early malicious traffic identification and verify the effectiveness of packet sampling in traffic identification combining theclassification algorithm.Eventually,we propose three solutions to solve imbalanced classification from the data level.(1)We propose a sample regenerated method based on Generative Adversarial Network,which can learn the potential distribution of real data through confrontation training and generate the minority class samples.And verifying the effectiveness of the method combining machine learning algorithms.(2)We proposes a non-linear weighting differential sample resampling method,which constructs a function that can reflect the different influence of the security samples and boundary samples of the minority class samples on the classification.And then,we can calculate the weight and sampling rate of each minority class sample.In addition,it is proved effective to combine the method and SMOTE to oversample.(3)We present an improved SMOTE algorithm based on differential evolution.The method can intelligently search for the optimal sampling rate combination and then SMOTE according to the sampling rate to complete oversample.Experiments show that this algorithm is effective in solving imbalanced classification problem.

Keywords/Search Tags:

Malicious traffic, Statistical characteristics, Imbalanced classification, Machine learning

PDF Full Text Request

Related items

1	Machine Learning In Network Traffic Classification
2	Research On Key Technologies Of Machine Learning Based Traffic Identification
3	Traffic Analysis For Internet Application Identification
4	A Cellphone Malicious Behaviors Research Based On Mobile Base Station Data
5	A Novel System For P2P Traffic Identification Based On Packet Inspection And Machine Learning
6	Research On Multi-objective Restricted Boltzmann Machine Model For Malicious Code Detection
7	Imbalanced Classification Methods Based On Extreme Learning Machine And The Application
8	Statistical Traffic Classification Method And Application With Mislabelled Training Samples
9	A Identification System Based On Statistical Characteristics Of The Transport Layer Session Behavior To Identify Malicious Traffic
10	Research And Implementation Of Semi-supervised Machine Learning Algorithms For Classifying The Imbalanced Protocol Flows