Font Size: a A A

Time Series Clustering Method Based On U-shapelets And Its Application In Air Quality Analysis

Posted on:2021-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:M Y GaoFull Text:PDF
GTID:2381330602487757Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the data is growing exponentially.Most of these data exist in the form of time series,so the mining technology and methods for time series data have received extensive attention.Because of the characteristics of time series data,such as massive,high-dimensional,noisy and so on,it is difficult to divide them accurately by using traditional clustering methods.However,the time series clustering method based on u-shapelets uses the features with local identifiability to distinguish the differences between time series.This method not only avoids the influence of noise on clustering,but also helps to improve the accuracy and efficiency of traditional clustering methods for clustering time series data.For this reason,this paper analyzes the u-shapelets time series clustering method and its limitations.At the same time,it makes improvements in three aspects:low accuracy,high time complexity,and diverse data types.The improved method is applied to 362 cities in China to classify the air quality level,in order to find the areas with serious pollution.The main work of this paper can be summarized as follows:(1)For the single feature time series clustering analysis,the u-shapelets single feature clustering method based on DTW is proposed.Firstly,the method randomly selects 1%of all sub-sequences as the candidate set,which solves the problem of high time consumption in extracting the best u-shapelets.Secondly,when calculating the quality evaluation of subsequences,DTW distance is used to improve the calculation of separation degree,improve the quality of selecting the best u-shapelets,and cluster analysis is carried out based on the best u-shapelets.In this paper,the DTW-u-shapeletClus method is verified with 5 standard data sets.The results show that its accuracy and efficiency are improved.(2)For the clustering analysis of multiple time series,a clustering method of multiple time series based on pu-shapelets is proposed.For multivariate time series,The PCA method is used to reduce the dimension of time series to obtain more valuable information.Then,Pearson correlation coefficient is used to sort all the sequences,and different sequences are selected as candidate sets.On this basis,the best u-shapelets sets are selected for cluster analysis.Common data sets were used for analysis in terms of accuracy and time consumption.The experimental results show that compared with BruteForce method and u-shapelets clustering method based on symbols,the method proposed in this paper has a good performance in accuracy and a significant improvement in running speed.(3)The two time series methods proposed in this paper are applied to the air quality index data of 362 cities in China,and the cities are clustered according to the air quality.Firstly,DTW-u-shapeletClus method is used to cluster cities according to PM2.5,PM10,SO2,CO,NO2 and O3,then PCApu-shapeletsMTSC method is used to comprehensively analyze the six pollutants,and the cities are divided according to the pollution level,and the clustering results are displayed in a visual way.
Keywords/Search Tags:Time Series, u-shapelets, Clustering, Air Quality Index
PDF Full Text Request
Related items