With people's attention to privacy in network communication,network traffic encryption is becoming a common method to protect the privacy and security in communication security.Traffic encryption brings great difficulties to network traffic identification and traffic anomaly detection.SSH(Secure Shell)is a widely used application layer security protocol at present.This protocol has perfect interactive authentication mechanism and asymmetric encryption mechanism.While provided secure communication services to users,some users also conceal their illegal behaviors through SSH tunnel.SSH protocol is utilized to access restricted websites abroad with the forwarding function,which causes great trouble to network managers.So,it is very important to study SSH traffic identification.Firstly,the interaction process of SSH protocol is deeply experimented and analyzed in this paper.The problem of extracting feature vectors for different applications in SSH tunnel is solved.Then,the characteristics of packet length,network flow and scale type of different applications in SSH tunnel are extracted.In addition,abnormal situations such as network fluctuation will occur in the process of feature extraction,it will cause packet loss and retransmit in the process of data transmission,which will lead to the existence of noise samples in the collected network traffic and then lead to the problem of blurred category concept in the training samples.In this paper,the method of SSH eigenvector denoising based on isolated forest is proposed,which effectively improves the accuracy of the recognition method.Subsequently,according to the continuous and non-linear characteristics of SSH encryption traffic,XGBoost integrated learning model is applied to SSH encryption network traffic classification research.This method can well deal with the continuous and non-linear statistical characteristics of SSH encryption traffic,such as packet length,time interval and so on.It also adds many methods to prevent over-fitting,such as regularization term,depth of decision tree,weight of leaf nodes,feature sampling and so on.The number of iterations of XGBoost ensemble learning method and the parameters of base classifier are optimized in order to better identify the application of SSH tunnel.Compared with the traditional machine learning model,this method has a great improvement in the accuracy of SSH encryption traffic identification.Five common applications under SSH tunnel,such as HTTP,FTP,SMTP,SCP,Login,are tested.The recognition accuracy and recall rate are all above 90%,and the recall rate of HTTP protocol is 95.81%. |