| The rapid development of the internet has brought great convenience to people’s work and life.However,the network is also under attack at any time,and the security situation is severe.At the same time,the highdimensional data in the context of the big data era has increased the time complexity required for data analysis and calculation.Therefore,it is of great significance to develop efficient methods and frameworks for anomaly detection.Among them,machine learning algorithms have been widely used in the field of anomaly detection due to their obvious advantages.Based on the above background,the work content of this article is the research and application of machine learning methods in the field of network anomaly detection.We have designed an anomaly detection framework based on machine learning methods.A hierarchical feature selection method based on Boruta algorithm and correlation coefficient calculation was used to process the dataset,and the training efficiency of the model was compared and analyzed;Isolation Forest(IF)and Extended Isolation Forest(EIF)are used to train and predict the data set,and confusion matrix and other evaluation indicators are used to evaluate the results;Two different model improvement schemes were proposed,including a Mixed Isolation Forest(MIF)and a cascading anomaly detection data scoring method,and horizontal comparison experiments were conducted with multiple algorithms.The specific work content of this article is as follows:Firstly,this article divides the CIC-ISDS-2017 dataset by attack type,resulting in a total of 4 datasets.Afterwards,we integrated the data processing results of the Boruta algorithm and correlation coefficient calculation.Then,we verified the improvement of model training efficiency by the this feature selection method.The experiment shows that the feature selection method can effectively improve the training efficiency of the model,with an average training time of 7.214 seconds,which is 43.25%less than the average training time of the original model.Secondly,this article investigates the impact of scalability in extended isolation forests on model training effectiveness.The results show that the extended isolation forest algorithm performs well in anomaly detection under different extension degrees,with an average AUC exceeding 0.9220.Afterwards,this article used isolation forests and extended isolation forests to train and predict on four datasets.The results indicate that both models can achieve good detection accuracy on all four datasets.The average AUC value of the isolation forest algorithm is 0.8823,while the average AUC value of the extended isolation forest algorithm is 0.8890.Finally,this article proposes two different model improvement schemes,including a mixed isolation forest and a cascaded anomaly detection data scoring method.On the one hand,hybrid isolated forests integrate the path lengths of data in isolation forests and extended isolation forests,and finally perform anomaly ratings on the integrated path lengths.The results indicate that mixed isolation forests can further improve the accuracy of detection,with an average AUC value of 0.8999 on four datasets.On the other hand,a cascaded anomaly detection data scoring method integrates base learners and cascades them with algorithms such as isolated forests.By setting up horizontal comparative experiments,it can be concluded that a cascaded anomaly detection framework can further improve the accuracy of detection.Based on the above completed work,the anomaly detection framework designed in this paper based on machine learning methods has certain application value in the field of anomaly detection. |