An Improved Clustering Algorithm For Large-scale Time Series Data

Posted on:2018-11-10

Degree:Master

Type:Thesis

Country:China

Candidate:R H Du

Full Text:PDF

GTID:2348330512476867

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

The security of temporal data has drawn substantial interest due to the proliferation and ubiquity of time series in many fields.In the anomaly detection system of time-related data,time series clustering is one of the most popular mining method.However many time series clustering algorithms primarily focus on detecting the clusters in a batch fashion that will consume much memory space and thus limit the scalability and capability for large time series.To solve this problem,this thesis proposed a time series clustering method��Ex-BIRCH algorithm,which is based on BIRCH algorithm,to mine the implied information of large time series accurately.The work of the dissertation is partly supported by the National Natural Science Foundation of China(No.61172072,61271308),Beijing Natural Science Foundation(No.4112045),and Research Fund for the Docoral Program of Higher Education of China(No.20100009110002).The main work of this paper includes:Firstly,this thesis compared the existing clustering algorithms and pointed out the challenges of large time series clustering.And then analyzed the advantages of BIRCH algorithm in processing large-scale data.Based on this,an improved clustering algorithm for time series is proposed,and a concrete improvement scheme is introduced:(1)The thesis replaced the distance metric in BIRCH algorithm.Considering the fact that Euclidean distance can't measure the time series accurately,this thesis adopted dynamic time warping(DTW)as the time series distance metric to achieve accurate clustering of time series.(2)The thesis changed the cluster centroid calculation method in BIRCH algorithm.In this paper,we proposed Ad-DBA algorithm based on the barycenter averaging algorithm in DTW(DBA algorithm).The Ad-DBA algorithm can be used to compute the time series mean in the dataflow environment.Ex-BIRCH uses the Ad-DBA algorithm as the calculation method of cluster centroid.(3)The thesis modified the cluster features in the BIRCH algorithm.The change of the distance measure and the averaging method will lead to the failure of the original feature vector in the BIRCH algorithm.By analyzing the calculation process of the DTW algorithm and the Ad-DBA algorithm,a new clustering feature is proposed to replace the original value.To demonstrate the effectiveness of proposed algorithm,this thesis conducted an extensive evaluation of Ex-BIRCH algorithm against BIRCH,k-means and their variants with combinations of competitive distance measures.Experimental results show that the extended BIRCH algorithm promote the accuracy significantly compared with BIRCH algorithm and its variants,and achieved competitive and similar accuracy as k-means and k-DBA.However,unlike k-means and k-DBA,the extended BIRCH algorithm maintains the ability of incrementally handling continuous incoming data objects.Finally the Ex-BIRCH algorithm was applied to solve a subsequences time series clustering task of a simulation multivariate time series datasets with the help of the sliding window.The results show that the improved algorithm can complete the sequential pattern mining in the dataflow environment.

Keywords/Search Tags:

Time Series, Data Stream, Clustering, Sequence Pattern Mining

PDF Full Text Request

Related items

1	Sequential Pattern Miningin Time Series Data
2	The Application Of Stream Data Time-Series Pattern Reliance Mining In Stock Market Analysis
3	Time Series Data Mining
4	Research On Time Series Data Mining Based On Similarity Analysis
5	Hierarchical Clustering Algorithm For Mining Frequent Patterns And Time-series Flow
6	Research On Data Mining And Forecasting Methods Over Time Series Data With Complex Structure
7	Efficient Periodic Pattern Mining in Time Series & Sequence Databases
8	Study On Sequence Patterns Mining And Its Application In Intrusion Detection
9	Research On Data Mining Technology Of Pattern-based Similarity Search In Time Series Database
10	Research On The Framework Of Mining Abnormal Pattern On Multiple Correlative Time Series