Font Size: a A A

Research On Nonlinear Time Series Clustering Algorithm Based On Centered Copula Process

Posted on:2022-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ZhenFull Text:PDF
GTID:2518306602965999Subject:Statistics
Abstract/Summary:PDF Full Text Request
Cluster analysis is the process of classifying data into different clusters,with the purpose of revealing the inherent properties and laws of the data.With the development of big data,a large amount of time series data has been accumulated through long-term detection and recording results in various industries,which has led to the problem of time series clustering.At present,most researches on clustering methods assume that the time series are only linearly dependent,but in some cases this assumption usually falls in practice.To overcome this limitation,in this thesis,we study clustering methods applicable to time series with a nonlinear dependent structure and propose two centered copula-based distances to measure dissimilarity among time series.The specific work is as follows:Firstly,we introduce the preliminary knowledge and basic theories which are closely related to this thesis.On the one hand,it includes the basic concepts of clustering,such as clustering steps,traditional clustering methods,commonly used distance measures and evaluation criteria for clustering results.On the other hand,we briefly introduce the concept and properties of copula function and its application in similarity measurement.Secondly,a clustering algorithm based on the centered Copula?CVM(Copula of Cramérvon Mises)test statistic is proposed,which is suitable for clustering nonlinear time series data.In this method,the centered copula function is used to measure the superiority of the correlation between random variables,and the centered copula process is used to capture the dynamic dependency structure of time series.This distance measures the difference between two centered copula processes according to the Cramer-von Mises test statistic and consider a non-parametric estimator for it.The estimator has an equivalent form that is convenient for calculation,which improves the efficiency of the algorithm.At the same time,the strong consistency of the estimator is guaranteed,which expands the scope of application of the distance measurement.The simulation results of the hierarchical clustering algorithm based on centered Copula?CVM shows that the proposed distance of time series is not only suitable for nonlinear time series data,but also has high clustering quality for time series with linearly dependent structure.Finally,for the time series data types with a large lag in reality,a distance based on the centered Copula?WAD(Copula of Wasserstein and Anderson-Darling)is proposed as a similarity measure of time series.Anderson-Darling distance reduces the influence of noise data by assigning weights.The WAD distance combines the advantages of the Wasserstein distance and the Anderson-Darling distance,which makes our proposed clustering method based on the centered Copula?WAD distance can avoid the dependence on the lag order of the time series,so as to solve the clustering problem of the time series with a larger lag order.The effectiveness of the centered Copula?WAD distance can be verified by the simulation experiment results.At the same time,the clustering algorithm based on the proposed distance is used to cluster the population of major cities in China,and reasonable clustering results are obtained.
Keywords/Search Tags:Nonlinear Time Series, Clustering Algorithm, Centered Copula Process, Distance Measurement, Correlation
PDF Full Text Request
Related items