Quick Clustering Multiple Data Streams Based On Time Distance

Posted on:2019-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:S D Yang

Full Text:PDF

GTID:2428330548468885

Subject:Computer software and theory

Abstract/Summary:

With the development of science and technology,lots of time-series data flow were produced by all kinds of sensors.Using clustering analysis to process time sequence data can be effective to segmentation business market,identify the speeding car,monitoring traffic jams,tracking vehicle and so on.But time-series data streams are real-time,it's a great challenge to control space cost,reduce the amount of calculation,and avoid repeating scan when processing the data,and find useful information at the same time.More and more scholars pay their attention to time series data analysis in recent years,and clustering is the most popular way to handle time series data.It has been applied to many fields(such as: the financial stock trading,environmental monitoring,network monitoring,web search log analysis,meteorological data fields,etc.)and good results have been achieved.There are a lot of clustering algorithms at present,but most of them are based on static dataset.They can't handle the real-time time series.Some of them(such as CluStream and DFT)were based on Euclidean distance,and cannot expresses the similar trend of the data flow effectively.It's not appropriate to use these algorithm to process the data of real life.For example,the trend is more useful than the price of stock in stock market.In addition,these algorithms are based on the whole sequence of the data,and could result in high cost.Therefore,we proposed a simple,fast and efficient method to cluster the time series with similar trend in this paper.The new method is based on landmarks of the data flow,landmarks can greatly reduce the amount of computation.In order to find the trend similar of data more effectively,we also proposed time distance to clustering time series.Different from other methods,the new method did not consider the gap between data values,and only focuses on the trend similarity of time series.Finally,we use both Synthetic dataset and real data sets to test the traditional cluster methods and our methods.The experimental results show that our algorithm has more advantages,higher efficiency and better quality in finding the trend similarity between time series.

Keywords/Search Tags:

Multi-data flow clustering, Time distance, Landmark, Similar trend

Related items

1	Research On Next Location Prediction Algorithm Based On Similar Behavior Of Mobile Objects
2	A Study Of Clustering And Data Analysis Methods Based On One-Dimensional SOM
3	Trend Knowledge Discovery In Sequence Data
4	The Clustering Of Time-Series Data Based On LB_Hust Distance Caculation
5	Research And Application Of Topic Clustering And Trend Analysis Based On Social Data
6	Research On The Index For Similar Search Based On Hamming Distance
7	The Research Of Similar Measurement Method Based On The Hausdorff Distance
8	Research On Series Data Similar Search Technology
9	Research On Multi-Time Scale Stock Forecasting Method Based On Clustering
10	Data Cleansing In The Detection Of Similar Records