| In recent years,with the continuous development of the Internet,more and more people have started using VPN tools to protect their network security and privacy,However,criminals also use VPN tools to conceal their criminal activities and engage in online crimes such as virus spreading and phishing.Moreover,to evade detection,criminals use TLS encryption technology to disguise VPN traffic as HTTPS traffic through tunneling and protocol emulation,which confuses the statistical and behavioral characteristics of the traffic and makes it difficult to detect using traditional methods.Therefore,detecting VPN traffic under HTTPS tunnel is of great significance for protecting network security and optimizing network management.This paper focuses on VPN traffic encrypted using the TLS protocol and proposes a VPN traffic detection method based on traffic ratio statistical features.This method is based on the traffic detection framework proposed in this paper.Firstly,multiple feature communication fragments are extracted from continuous traffic,and then multi-dimensional ratio statistical feature sequences are extracted from the feature communication fragments to characterize the traffic behavior patterns.Finally,the ratio feature matrices composed of multiple fragments are input into a deep learning model CNN for training and detecting.In this framework,this paper proposes two corresponding flow representation methods for VPN traffic identification and classification:(1)For VPN traffic identification,a 3D ratio statistical feature sequence for identifying TCRF is proposed.This sequence utilizes the statistical ratio of statistical window and the entire feature segment to reduce the influence of different service type traffic feature patterns and network jitter within the VPN tunnel,and enhance the VPN’s own tunnel behavior pattern.The experimental results show that the Accuracy and F1-Score of TCRF reach over 90%,with a lower FPR and an AUC value over 0.95.Compared with FlowPic and endto-end 1D-CNN methods,TCRF improves the Accuracy by about 10%and F1-Score by about 15%.(2)For VPN traffic classification,a 3D ratio statistical feature sequence for classifying TCRF-cla is proposed.This sequence utilizes the statistical ratio within the statistical window to amplify the differences in feature patterns of traffic for different service types inside the VPN tunnel.Additionally,the relative distance of traffic byte distribution is utilized to reduce the interference caused by network factors.The experimental results show that the Accuracy and F1-Score of TCRF-cla reach over 95%.Compared with FlowPic,end-to-end 1D-CNN and Time-related methods,TCRF-cla improves the Accuracy by about 12%and F1-score by about 20%.The method proposed in this paper balances the real-time performance and Accuracy of VPN traffic identification by selecting multiple feature fragments.The ratio feature sequences are used to reflect the overall behavior pattern of the traffic,and different dimensions are used to enhance the characteristics of VPN’s own tunnel behavior and traffic service types inside the VPN tunnel.In addition,using ratio features avoid the error introduced by normalizing conventional statistical features during the process of applying deep learning models.Finally,this paper builds a VPN traffic detection system under HTTPS tunnel based on the above research.The system is implemented based on the Flask framework and includes three modules:traffic acquisition,traffic preprocessing,and traffic detection.The system has been tested and verified.It can provide online VPN traffic detection services for suspicious PCAP files and visually display the identification and classification results of VPN traffic through a web page. |