VPN provides not only easy and quick remote-accessing approach for legal users,but also convenience ways for criminals for illegal actions.Especially with the quick increasement of the requirements for studying and working from home because of the COVID-19,the uses of VPNs has become more and more frequent,which makes the supervision needs of VPNs becoming more and more urgent.As V2 Ray is a widely used V2 Ray tool in self-built VPNs today,the research on business classification is a very important part of VPN supervision.Due to the lack of data support,most of the existing relevant researches were focus on SSL Open VPN,and the research on V2Ray-related business classification is very lacking.And The special traffic characteristics introduced by the private VMess protocol in V2 Ray also lead to the unsatisfaction to use other VPNs’ business-type classification method to classify V2 Ray traffic.Therefore,aiming at more refined business classification,this paper defines V2 Ray traffic using VMess protocol as V2 Ray encrypted traffic,and then propose a V2 Ray traffic business-type classification method based on the existing research on V2 Ray encrypted traffic identification,which aim at more refined.The research specifically includes the following four aspects:(1)Aiming at the problems that there is no available V2 Ray encrypted traffic dataset for this research,and the difficulties of getting related dataset caused by the V2Ray’s special communication characteristics,a construction method for V2 Ray encrypted traffic dataset with business labels is proposed.By modifying the V2 Ray source code,the method extracts the information needed from the running V2 Ray process,and realizes the mapping from a given flow to its business type by application and domain information at the constructed traffic capturer.This method greatly reduces the difficulty for labeling V2 Ray traffic,and provides a high-quality dataset as the data foundation for the following researches.(2)Aiming at the problems including highly random bytes’ distribution and the chaotic global sequential features of the V2 Ray traffic,a method which combines the 1D-convolution operator and transformer operator to classify the business type of given V2 Ray traffic is proposed.This method use 1D-convolution operator for local N-gram feature extracting and uses transformer operator for global feature extracting and feature vector pooling to classify the business type for given V2 Ray traffic packet length sequence.The experiment shows that the average accuracy rate of this method for classifying V2 Ray traffic business type reaches 97.84%,which is better than state-of-art methods for normal encrypted traffic classification for the similar classification-purposes.(3)Aiming at the more refined business classification requirements for V2 Ray supervision,a GNN-based method which classify the request type of given V2 Ray web-browsing traffic is proposed.This method designs 6 kinds of conversion approaches which transform normal V2 Ray traffic packets sequence to graphic data,and designs a GNN model based on residual-gated graph convolution operator and Top-K pooling operator to classify the request type for given V2 Ray traffic flows.The experiment shows that the highest average accuracy rate of this method for classifying V2 Ray webbrowsing traffic’s request types is 98.62%,which is better than the current state-of-art methods for normal encrypted traffic classification for the similar classification-purposes.(4)Based on the researches of(2)and(3),a V2 Ray traffic business classification prototype system is designed and implemented.The system integrates the two classification methods,and create model training module and traffic analysis module.During model training,users can use offline data sources to train models for two classification tasks above.During traffic analysis,users can choose offline data sources or online data sources for getting data to analyze.For a given V2 Ray traffic flow,the system can output the business type it belongs.If it is a V2 Ray web-browsing traffic,the system will also output the request type it belongs.According to the test results of the designed test cases,the system has implemented all the functions required and has strong robustness. |