With the continuous expansion of network scale and complexity,abnormal network traffic has an increasing impact on network system performance.Rapid analyze the network traffic,dig out whether there is abnormal information and promptly report to the police or perform other related operations,which is essential to ensure the normal operation of applications and systems.Anomaly detection methods mainly include statistics-based methods,distance-based methods,density-based methods,and reconstruction-based methods.Most of these methods build models and analyze them on the collected offline data sets.Although they perform well in simulation experiments,these algorithms cannot achieve the desired effect in practical applications.The constantly changing data stream in the new network environment poses new challenges to the traditional anomaly detection algorithm based on static data.In the face of the network traffic real-time data scale expanding trend,it is the key to process these streaming data rapidly and obtain useful information with limited storage space and low time complexity,to achieve timely monitoring of abnormal conditions of stream,which can improve the availability of network and guarantee the quality of network service.This paper mainly studies the anomaly detection in the network data stream environment.Accrding to the dynamic characteristics of network data stream,the online ensemble learning theory is introduced,and the isolated forest algorithm is improved based on sliding window mechanism and incremental learning.First,use the data instances in the first sliding window to build multiple random binary trees,which form an ensemble evaluator.The constructed ensemble evaluator can quickly determine the degree of abnormality of subsequent data in the data stream after calculation.While analyzing the data stream,the instances in the sliding window will be stored separately with a certain probability.When the number of data in the buffer is greater than the threshold,the ensemble evaluator begins to update.The evaluator is adjusted using two ways:tree online growing and tree discarding,which are important measures to ensure that the detection model changes with changes in the characteristics of the data stream.The online tree growth mechanism randomly adjusts the branch structure of a certain number of subtrees;while the tree discarding method discards part of the old trees and constructs new trees through a quality weighting mechanism.In order to improve the time efficiency,stability and scalability of the improved algorithm,this paper conducts parallel research on the online anomaly detection method based on isolated forest algorithm based on the Storm distributed computing framework,and implements a distributed version of the algorithm.In this paper,various Spout and Bolt components of the detection model under the Storm cluster are designed in detail.Through the actual operation of the detection model,it is verified that the improved algorithm has relatively high processing performance and can correctly detect anomaly in the data stream.Finally,an online anomaly detection system based on isolated forest is designed and implemented.The system is mainly composed of two functional modules,the Agent module and the safety monitoring and analysis module.The Agent module is responsible for collecting the host’s operating system information,network connection data,process behaviors and other information,then sends the collected data to the safety monitoring and analysis module for further research and analysis.The safety monitoring and analysis module performs real-time detection of abnormal behaviors through static rule matching and a detection model constructed using improved algorithms,and promptly sends out alarm information for abnormal operations. |