Font Size: a A A

Research On Target Recognition Technology Based On Semi-supervised Learning

Posted on:2020-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:K X XuFull Text:PDF
GTID:2428330596475508Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Application layer protocol identification as the basic technology of network monitoring has played a huge role in network management,security monitoring and user experience improvement,but it still faces many problems.First,the increasing number of encryption protocols has made the traditional port-based method and the method based on application-layer load analysis no longer applicable.Secondly,the number of application-layer protocol's types has grown rapidly.So,in the method based on application-layer load analysis,the feature database needs to be updated continuously,and the rapid growth of feature library's capacity poses a serious challenge to the timeliness of feature matching algorithms.Introducing the machine learning method into the application layer protocol identification can greatly improve the performance of the recognition system.In this thesis,we study the application layer protocol identification based on the semi-supervised learning method.The main contents and results of our research are as follows:(1)By analyzing the generation method,semi-supervised support vector machine and graph semi-supervised algorithm,compared to these semi-supervised learning algorithms,Tri-Training and Co-Forest utilize the idea of integrated learning,and using them can avoid many problems such as model assumptions,non-convex of loss function,etc.,and they are simpler and more efficient.In this thesis,Tri-Training and Co-Forest,these two disagreement-based semi-supervised learning algorithms are applied to the application layer protocol identification.The results of experiments show that using CoForest can obtain higher protocol classification accuracy than using Tri-Training.(2)Feature selection is an important module in protocol identification.In this thesis,based on the analysis of the advantages and disadvantages of Relief and wrapper feature selection algorithms,we propose a wrapper feature selection algorithm based on Relief statistic,which combines the high efficiency of the Relief algorithm with the high accuracy of the wrapper algorithm.The results of experiments show that,on the experimental data set constructed in this thesis,compared to the Relief feature selection algorithm,the proposed method improves the accuracy of protocol identification by about 1% on average.(3)For the problem of the imbalance of network flow data,the wrapper feature selection algorithm based on Relief statistic has been further improved.In this thesis,a wrapper feature selection algorithm based on weighted Relief statistic has been proposed.The algorithm is mainly realized by weighting the Relief statistic and changing the feature subset evaluation function.The results of experiments show that,compared to the wrapper feature selection algorithm based on Relief statistic,the feature subset selected by the proposed algorithm is more beneficial to the identification of the minority categories in the network flow,such as P2 P,DATABASE and MULTIMEDIA,their recall rate has been improved by varying degrees,while the precision rate of the majority categories of application layer protocols such as WWW,MAIL has also been improved by varying degrees.
Keywords/Search Tags:application layer protocol identification, semi-supervised, Co-Forest, feature selection, weighted Relief
PDF Full Text Request
Related items