| The number of self-propagating malware attacks using encrypted HTTP traffic has increased dramatically in recent years.Traditional network traffic detection methods for unencrypted traffic are difficult to apply to encrypted traffic detection.Encryption not only protects user privacy,but also brings security risks: the encrypted payload is large and cannot be directly observed.Malicious traffic may be hidden in encrypted traffic,leading to a series of security problems.So the research of malicious encryption traffic classification is very important.Currently,malicious encryption traffic classification methods is divided into two types:traditional machine learning based on feature engineering and deep learning based on automatic feature extraction.How to improve the true positive rate and reduce the false positive rate is an urgent problem to be solved in the field of malicious encryption traffic classification.This paper considers that there are still two problems in the current work:(1)Due to the limited encrypted flow information and inaccurate label,the classification method based on flow features is easy to lead to a high false positive rate;(2)The problem of low generalization of single model,that is,when the network features of traffic change,the variance of classification effect of single model is large.In this paper,a general malicious encryption traffic classification framework is implemented without traffic decryption,including the following work:(1)Proposed a feature engineering method for malicious encrypted traffic based on flow level and host level.For the flow level,statistical features and sequence features are constructed.For host level,statistical features,certificate features,sequence features and packet length distribution features are constructed.By mining the basic information of the raw traffic,more effective features can be obtained.The experimental results showed that the combination of the two types of features can effectively improve the score.(2)Proposed a method of integrating multiple models,rather than fitting all features into single model.This method reduces the variance of the prediction results,makes it more stable,and reduce the time cost by means of parallel training and prediction.Experiment showed that the models can complement each other,which verified the effectiveness and robustness of the proposed method.(3)Designed and implemented a system for malicious encrypted traffic.This system can upload traffic files in pcap format and reflect the classification results of encrypted traffic in real time. |