| In recent years,people pay more attention to the knowledge contained in data,and develop several techniques for data mining and analysis.Time series segmentation is of great significance among these technologies.Time series refers to the data series collected in a certain order,which widely exists in various fields.In general,caused by the long collection period and the changeable behavior patterns of the objects,there are multiple structures in time series.In order to simplify the representation of complex data series,it is necessary to use time series segmentation to divide it into multiple single structure subsequences.In addition,time series segmentation can also help to find the change points of structure in data series,which generally represent the occurrence of important events.Industrial time series data widely exist in industrial production process.These data contain important information reflecting the state of production process.Time series segmentation is of great significance for the analysis of industrial time series,which is a typical big data.Firstly,most advanced industrial process monitoring,control and optimization algorithms rely on process models.The process modeling of different working conditions needs a large number of data of corresponding working conditions,so it is necessary to extract the information of corresponding working conditions by industrial time series segmentation.Secondly,industrial system will go through many changes from one working condition to another,which are very common in industry.In order to increase product yield and early warning abnormal conditions,it is necessary to establish an automatic change point identification system.The first step is intelligent analysis of process data.In these analyses,time series segmentation is particularly important.In this paper,some new progress has been made on the basis of previous research on the multivariate industrial time series segmentation algorithm.It mainly includes the following aspects:(1)The time series segmentation algorithm based on the theory of piecewise linear approximation(PLA)has been widely used in univariate data,but the industrial process data is generally multivariate.In order to find the change of cross-correlation structure,the multivariate time series segmentation algorithm is usually based on the principal component analysis(PCA).However,this method ignores the dynamic autocorrelation in time series.To capture the change of dynamic autocorrelation structure,dynamic PCA(DPCA)technology was also introduced into the segmentation algorithm.However,DPCA algorithm confuses the dynamic characteristics with the cross-correlation characteristics,so it is not clear which structure has changed at the segmentation point.In order to consider the dynamic autocorrelation characteristics and define the type of change at the segmentation point,this paper considers the dynamic autocorrelation and cross-correlation separately,and proposed the multivariate time series segmentation algorithm based on the dynamic predictability and the multivariate time series segmentation algorithm consistent with the dynamic predictability and cross-correlation.The former is based on the change of dynamic autocorrelation structure,while the latter is based on the weighted unity of the two characteristics.The experimental results show that the proposed methods have good segmentation ability in the evaporation process time series of alumina and the flame characteristic series in the melting process of electric melting magnesium furnace.(2)In the traditional PCA based segmentation method,the cost function and quality evaluation index are usually designed by reconstruction error.When the data has the problem of Measurement noise interference,the reconstruction error is often difficult to reflect the change of data structure.Therefore,we proposed the cost function and evaluation index based on cosine distance,that is,the cosine distance between reconstruction data and original data is the criterion for judgement.We call them cost function and evaluation index based on reconstruction similarity.The experimental results show that the new cost function and evaluation index avoid the segmentation anomaly caused by Measurement noise interference problem,and have strong robustness.(3)To solve the problem of data imbalance,this paper proposed a new calculation form of cost function.Different from the traditional cost function which simply calculates the reconstruction error,the calculation of the cost function is conducted interactively between two adjacent subsequences.The purpose of the new calculation form is to make the weight of the two subsequences independent of the amount of data.The experiment shows that the above calculation form of cost function can overcome the problem of data imbalance. |