Font Size: a A A

Research On Robustness Of Time Series Clustering Algorithm

Posted on:2022-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:G R LiFull Text:PDF
GTID:2480306605468494Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
In the era of rapid development of big data,all walks of life have accumulated a large amount of data.As a representative type of data,time series data has attracted the attention of many scholars and its full use can bring great value to human society.A key issue that follows is how to analyze time series data reasonably.Cluster analysis is a typical method for effective analysis of time series.However,the high dimension,complexity,and outliers of time series data all bring great challenges to clustering analysis.To solve these problems,we can start with data similarity measurement,cluster optimization method,scalability,cluster shape and local outliers,etc.In this thesis,I'm committed to improving the existing similarity measure between time series,designing a robust similarity measure to improve the robustness of time series clustering algorithm.Two robust similarity measures and corresponding time series clustering algorithms are proposed for time series data with outliers.A robust similarity measure between time series based onQ_n statistics and its clustering algorithm are proposed.First,the method introduces a robust correlation coefficient based on statistics to replace the classical Pearson correlation coefficient to calculate the correlation coefficient matrix of time series data.Second,the determinant of the new correlation coefficient matrix is used to construct the similarity between two time series performance measure—RGCC.Finally,based on this measure,the distance matrix between the sequences is calculated,which is used as the input of the hierarchical clustering algorithm to cluster the data.A robust similarity measure between time series is constructed based on Gaussian kernel function.Firstly,the kernel function is applied to the calculation of correlation coefficient,a correlation coefficient based on kernel function is introduced,and the correlation coefficient matrix of time series data is calculated.Secondly,the new determinant of correlation coefficient matrix is used to reconstruct the similarity measure between two time series-KRGCC.Finally,the hierarchical clustering of time series data is carried out based on KRGCC.Through clustering experiments on the simulation data set generated by the autoregressive model,the dynamic factor model and the hourly electricity tariff data set in a certain area of England show that for the time series data with outliers,the clustering results based on RGCC or KRGCC are obviously closer to the real data division results than the clustering results based on the original generalized cross-correlation measure(GCC).The two time series clustering algorithms proposed in this thesis are both robust and suitable for time series data with outliers.In addition,compared with the algorithm based on RGCC,the algorithm based on KRGCC has lower time complexity while ensuring the accuracy of the algorithm.
Keywords/Search Tags:Time Series, Cluster, Outlier, Robust Estimation, Kernel Function, Correlation Coefficients
PDF Full Text Request
Related items