| ICS(Industrial Control Systems),as an important national infrastructure,is widely used in major industrial production fields such as energy chemical industry,intelligent agriculture,and transportation.Unlike traditional industrial control systems in a closed and independent environment,many computer-related technologies have been added to the modern ICS informatization process to make it open,convenient and interconnected,but it also brings many information security issues.The industrial control system collects industrial time series data,which has the characteristics of large data volume,long collection time,and multiple data types.It is not suitable for large-scale manual labeling for supervised learning.Therefore,the anomaly detection of ICS usually adopts unsupervised learning the way.However,unsupervised algorithms often require a large amount of data to learn but some industrial systems cannot generate enough data in a short period of time.The longer the time,the greater the risk of intrusion.Anomaly detection of the data stream collected by ICS within time;secondly,if the time-based cyclic neural network model is used,if the time step in the model is long,it will cause problems such as gradient disappearance and gradient explosion.In order to solve the above problems,this article will propose a novel anomaly detection hybrid model,and conduct a data set selection and preprocessing research to compare the anomaly detection performance of commonly used unsupervised learning algorithms,and finally to verify that the solution proposed in this article is feasible for practical use among the industrial control systems,the main research content of this article is developed from the following aspects:(1)This article first describes the significance of ICS to society and the country,and then introduces related algorithms commonly used to deal with anomaly detection problems,such as statistical methods,rule-based information methods,and unsupervised,semi-supervised and fully Supervised machine learning related algorithms.Aiming at the characteristics of the ICS collection data stream and the limitations of single model processing,this paper proposes a hybrid model of clustering and recurrent neural network to solve the above-mentioned problems.(2)Choosing two commonly used data sets for data anomaly detection:CCF data set and SKAB data set.In most industrial control systems,the observer is usually not clear about the specific physical meaning of the data collected by the sensor and the collected data has many attributes.This feature coincides with the feature background information in the CCF data set that has been erased,and the feature dimension is large.It is consistent,the data set is chosen to study the extensive use of the model.The SKAB data set is a standard industrial data set,and the information of each feature dimension is known,so it can be used to study the peculiarities of the model.This paper carries out corresponding data preprocessing and feature extraction steps on the above two data sets for the training and testing of subsequent learning algorithm models.(3)In ICS,an unsupervised method is usually used to detect abnormal data streams,and the CCF data set is similar to the general industrial data set and the amount of data is large enough,and there are some classification labels,so it can be used to cluster algorithm K-means and LOF perform corresponding performance testing and comparison.The SKAB data set is a time series data set under a specific industrial background,with clear characteristic background information and classification labels,which can be used for LSTM algorithm performance testing.In order to solve the problem of ICS anomaly detection,this paper proposes a hybrid model based on K-means,LOF and LSTM,and uses two data sets to verify its performance.The results show that the overall performance of the hybrid model is better than that of a single sub-model.Finally,put the proposed hybrid model into the actual industrial system to prove that it can satisfy the application needs of real scenarios.This article mainly discusses the characteristics of the data stream collected by ICS in a short period of time,as well as the drawbacks of existing algorithms in dealing with problems,and proposes a hybrid model to solve related problems.And through two representative data sets to verify the performance of the hybrid model,it can satisfy the practical application of ICS anomaly detection. |