The power dispatching automation system improves the ability of multi-level dispatching to deal with major power grid accidents,and plays an important role in maintaining the safe and stable operation of power grid.Once the system business is abnormal,the power grid operation will be greatly affected,or even paralyzed,bringing huge economic losses.The system is characterized by a wide variety of business types and complex business logic interactions,which brings the characteristics of multiple dimensions and diverse spatial distribution of scheduling monitoring data.The existing stream data anomaly detection methods based on machine learning have problems such as difficulty in effectively balancing the detection accuracy and detection efficiency of local anomalies and other special anomalies.In addition,due to the change of scheduling functions,the distribution of multi-dimensional monitoring data changes,resulting in the concept drift problem.The existing concept drift detection methods have the problem of empty window period,and there is room for improvement of detection accuracy in continuous progressive concept drift scenarios.In order to improve the intelligent level of the system,help the dispatching personnel timely understand the operation status of the system business,and ensure the reliable operation of the business,based on the idea of machine learning,this paper researches on the stream data anomaly detection technology of the power dispatching automation system,and the main work is as follows:1)This paper studies the anomaly detection method of power dispatching data based on machine learning.In view of the existing anomaly detection methods based on machine learning,which are difficult to effectively balance the detection accuracy and detection efficiency of local anomalies and other special anomalies,an anomaly detection method based on logarithmic interval isolated forest was proposed.Firstly,the Mahalanobis distance from each sample point to the data distribution center is calculated,which improves the measurement accuracy under the condition of distribution differences between data dimensions.Secondly,the logarithmic interval isolation strategy was designed to construct multiple subtrees,which were integrated into paired interval isolation forest anomaly detectors to screen out abnormal samples in the data set,and both detection accuracy and detection efficiency were taken into account.Finally,the public data set and the business data set of a provincial power grid dispatching center are used as training and testing samples to verify the advanced performance of the proposed method in the comprehensive performance of anomaly detection AUC value and the feasibility of its application in the actual system.2)In this paper,an anomaly detection method for stream data under the condition of concept drift is studied.An adaptive time-weighted window concept drift detector based on Hoeffding inequality was designed to avoid the problem of increasing false positives caused by concept drift in the stream data anomaly detection framework of power dispatching data.Firstly,combining with the application background of the system,the characteristics of the dynamic change of the business data distribution with time are analyzed.Secondly,the existing methods were used for scene analysis,and the adaptive window and time-weighted strategy were fused to solve the problem of empty window period of the existing methods.A concept drift detection algorithm suitable for power dispatching data was designed based on Hoeffding inequality.Finally,using the algorithm designed experiment,using a variety of public synthetic data set and the actual electric power dispatching automation system business data,analysis of the concept drift detection algorithm in detecting abnormal changes in the performance before and after improvement,verify the feasibility of this scheme is used in anomaly detection,anomaly detection framework for the entire flow data model update decision-making to provide the reference information.3)This paper studies the anomaly detection and filtering pruning strategy of stream data.In order to further improve the anomaly detection efficiency of stream data,a filtering and pruning strategy of stream data anomaly detection based on kernel density estimation was proposed.In order to solve the problem that the high dimension of data stream had influence on the nearest neighbor algorithm,the data preprocessing method of autoencoder dimension reduction was adopted to extract the features of the original data stream.First of all,the efficiency requirements of online anomaly detection are analyzed based on the fact that the normal data is much more than the abnormal data in the actual operation of the system business.Secondly,the dimension-reducing data processing based on the autoencoder is used to extract the features of the high-dimensional data stream appropriately,so as to reduce the interference caused by the less relevant dimensions.As the preprocessing of the subsequent efficient pruning strategy,the data in the cache can be quickly distinguished into normal data and suspicious data for filtering.Finally,the effectiveness of the proposed method is proved by comparing the performance of different anomaly detection algorithms with a variety of public data sets and actual business data of power dispatching automation system. |