Font Size: a A A

Research On Efficient Time Series Clustering Algorithms

Posted on:2020-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:W C ZhengFull Text:PDF
GTID:2370330602951891Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
A time series is a collection of chronological data points,which is a ubiquitous form of data,such as the electrocardiograms,stock prices trend,and other large amounts of time-related data series.Clustering time series can be defined as dividing similar time series into a same class,which can extract hidden time-related valuable information and help people make good decisions.Therefore,it is fact that clustering time series has very great application value in medical health,stock investment,abnormal monitoring,etc.Since time series data has following characteristics,such as high dimensionality,unequal length,and time sequence,it is impossible to directly use the general distance measure to calculate for time series resulting in existing traditional static clustering algorithms being ineffective.In the past ten years,many researchers and technicians have devoted to the new efficient time series clustering algorithms.All of the works mainly focus on two aspects: time series representation method to effectively reduce dimensionality of a time series,and the accurate similarity measure to refine the similarity between time series.Although some relatively efficient and effective time series clustering algorithms have been proposed,the following defects are ordinary:(1)The existing time series representation methods get lose the original sequence information when carrying out dimensionality reduction for the time series,which would decrease the accuracy of the algorithm;(2)Existing similarity measure methods for the time series can't estimate the distance between time series very well,and its precision has a large room to improved;And(3)the existing better similarity measure has higher time complexity.Therefore,research and develop a new and efficient time series clustering algorithm have great theoretical and practical significance.The thesis stems from the National Natural Science Foundation of China.In order to overcome the shortcomings of the existing time series clustering algorithm,the author first deeply studies the current best time series clustering algorithms.On the basis,the two efficient and effective time series clustering algorithms are proposed.The main work and innovations of the thesis are as follows:(1)A novel efficient time series representation method is presented.The method can not only preserve most original sequence information when conducting dimensionality reduction to the time series,but also can extract the shape features of the original time series.The above characteristics of the proposed time series representation method can improve the accuracy of the similarity measure to the time series.(2)A new concept of the synchronization site between time series is introduced,and a novel method for discriminating the synchronization site are presented.The advantage of the synchronization site is that it can capture the macroscopic shape of the time series curve.Based on the proposed new concept,an efficient time series similarity measurement method is developed,which combines global similarity with local similarity of the time series leading to better similarity accuracy between time series.(3)As the useless prefix in the time series can reduce the precision of the algorithm,an efficient useless prefix deletion method is put forward.Finally,with the above introduced new concepts and proposed methods,a new efficient time series clustering algorithm TSCEFAD is designed and implemented.(4)Available public literature suggests that the longest common subsequence(LCS)algorithm is one of the best time series similarity measure methods.However,the existing LCS algorithm has high time complexity and low efficiency.To overcome the weaknesses,a new efficient LCS algorithm suitable to the similarity measure between time series is proposed.Moreover,based on the proposed efficient LCS algorithm,a novel efficient time series clustering algorithm TSCELCS is developed.On the current authoritative open source time series dataset UCR,the proposed two algorithms are fully tested and verified.The experimental results show that our algorithms can cluster time series on the dataset efficiently and effectively.Moreover,our algorithms are much better than the-state-of-art time series clustering algorithms in the efficiency and accuracy.The author's future efforts are to further improve the time and accuracy performance of the proposed algorithms,and strive to apply the proposed algorithms to multivariate time series clustering.
Keywords/Search Tags:Time series clustering, Similarity measure, Synchronization locus, Longest common subsequence
PDF Full Text Request
Related items