Font Size: a A A

Network Application Programs Classification Algorithm Based On Feature Selection And Model Fusion

Posted on:2021-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:W X ZhangFull Text:PDF
GTID:2518306047484034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Considering the rapid growth of the number of Internet users,the emergence of various new network application programs and the increasing complex network environment,identifying program is a more fine-grained and deeper analysis method in today's network traffic identification technology.As the basis and premise of network management,network application programs classification plays an important and irreplaceable role in network monitoring,network security,daily operation and maintenance,traffic billing,and improving user experience.In recent years,by extracting network traffic features,using machine learning and deep learning to identify network application program is the current mainstream method.But the extracted network traffic features are complex and redundant,so selecting features through feature engineering is the first step of network application program recognition and classification task.However,different feature subsets processed by different feature selection methods on the original feature set have a great influence on the time performance and classification accuracy of the classification algorithm.How to find a simple,standard,high performance and high accuracy feature selection algorithm is one of the important tasks in this field.This thesis proposes a classification method,FIRFEP,for network application programs based on combined feature selection,this method first combines feature importance filtering method and recursive feature elimination method,and then uses Pearson correlation coefficient method to drop redundant features.The experimental results show that compared with the traditional feature selection method of variance threshold method,recursive feature elimination method and logistic regression based L1 penalty feature selection method on the open network data set,the combined feature selection method improves the classification accuracy by 0.5% ~ 3.0%,and the average running time is reduced by more than 50%.After using the combination feature selection method to select effective features to improve the performance and accuracy of the classification model,how to further improve the classification effect of network application programs and the generalization ability of the model is another focus of this thesis.Because the model fusion method has the advantages of easy understanding,simple implementation and good results,this thesis puts forward a two-layer model fusion algorithm based on the idea of feature engineering while retaining the advantages of model fusion.The first layer of the algorithm combines several different classification models to train and predict the original data,and then preserves the probability values of the prediction results of these models,that is,the probability that the current data is most likely to belong to a certain class is predicted by the model.The second layer adds these probability values as new features to the original dataset to generate a new dataset,and then uses a single model to complete the network application programs classification task on the new dataset.Considering the generation of new features in the process of the algorithm and the different characteristic between the new features and the original data,the algorithm is called model fusion algorithm based on pseudo-features in this thesis.On public network datasets,experimental results show that model fusion algorithms based on pseudo-features has higher accuracy,recall and lower false alarm rate than single models and traditional model fusion algorithms At the same time,on multiple datasets with different types and numbers of network application programs,the experiment not only verifies the strong generalization ability of the algorithm,but also determined the optimal base model number of the fusion model to be 3.This hyperparameter helps to improve the effectiveness of the algorithm in practical engineering.
Keywords/Search Tags:machine learning, feature selection, model fusion, network application program classification
PDF Full Text Request
Related items