Font Size: a A A

Research On Data Mining Method Based On U-shapelet Time Series

Posted on:2022-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:L H LuoFull Text:PDF
GTID:2518306524481584Subject:Statistics
Abstract/Summary:PDF Full Text Request
Time series exist extensively in our present era of rapid development.It has great applications in many fields such as financial analysis,biomedicine,geographic monitoring,image processing,etc.In the past two decades,there has been an endless stream of research on time series as the main object.With the rapid development of the era of big data,massive data need to be analyzed urgently,among which time series data mining becomes particularly important.Mining and analyzing these time series data can find the most valuable data and prospective information from the massive data.The field of data mining of time series includes many systematic researches.Among them,time series clustering is an extremely important branch.Compared with time series classification,the unsupervised nature of clustering can be better applied to practical research.The process and results of clustering are helpful for other aspects of time series data mining.Keogh proposed the concept of u-shapelet time series,which is a characteristic sub-series with clustering property.The time series clustering method based on u-shaplet is mainly to screen feature sub-sequences and select the most identifiable sub-sequences as the u-shapelet.The distance matrix is generated by calculating the distance between it and the time series.The time series is transformed and then clustering is carried out by k-means.The original algorithm has high stability and precision.However,the algorithm process has a very high time complexity and cannot be applied to large data sets.Aiming at the problems,this paper improves the original u-shapelet time series clustering algorithm from two directions.Firstly,a clustering method of u-shapelet time series based on fragmentation right trend matching is proposed,FTFM-US algorithm for short.In this paper,a fragmented right trend feature representation method is proposed,which preserves the extreme turning point and trend feature of time series.By matching the extreme turning point,a new distance measurement method between subsequences under the fragmented right trend is proposed,which solves the problem that the trend cannot be matched in Euclidean space.Experiments show that the FTFM-US algorithm has improved the clustering accuracy and running efficiency.Secondly,in order to solve the problem of poor effect of large data sets,this paper proposes two strategies in the second direction: replacing subsequence extraction rules and introducing local sensitive hash(LSH)method to screen similar subsequences.The sliding window method is used to extract the subsequence into the candidate set,and the extraction rules are changed.The extracted subsequence candidate set uses the E2 LSH method under p-stable distribution to quickly screen the similar subsequence in the subsequence candidate set,which is referred to as the E2LSH-US algorithm.The algorithm speeds up the extraction process of feature subsequence.Experimental results show that compared with the original violent search algorithm,this algorithm greatly speeds up the running speed and reduces the time complexity.And E2LSH-US enables u-shapelet to be applied on large data sets.
Keywords/Search Tags:u-shapelet, Time series clustering, Fragmentation right trend feature, Extreme turning point matching, E2LSH
PDF Full Text Request
Related items