Font Size: a A A

Quick Clustering Multiple Data Streams Based On Time Distance

Posted on:2019-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:S D YangFull Text:PDF
GTID:2428330548468885Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of science and technology,lots of time-series data flow were produced by all kinds of sensors.Using clustering analysis to process time sequence data can be effective to segmentation business market,identify the speeding car,monitoring traffic jams,tracking vehicle and so on.But time-series data streams are real-time,it's a great challenge to control space cost,reduce the amount of calculation,and avoid repeating scan when processing the data,and find useful information at the same time.More and more scholars pay their attention to time series data analysis in recent years,and clustering is the most popular way to handle time series data.It has been applied to many fields(such as: the financial stock trading,environmental monitoring,network monitoring,web search log analysis,meteorological data fields,etc.)and good results have been achieved.There are a lot of clustering algorithms at present,but most of them are based on static dataset.They can't handle the real-time time series.Some of them(such as CluStream and DFT)were based on Euclidean distance,and cannot expresses the similar trend of the data flow effectively.It's not appropriate to use these algorithm to process the data of real life.For example,the trend is more useful than the price of stock in stock market.In addition,these algorithms are based on the whole sequence of the data,and could result in high cost.Therefore,we proposed a simple,fast and efficient method to cluster the time series with similar trend in this paper.The new method is based on landmarks of the data flow,landmarks can greatly reduce the amount of computation.In order to find the trend similar of data more effectively,we also proposed time distance to clustering time series.Different from other methods,the new method did not consider the gap between data values,and only focuses on the trend similarity of time series.Finally,we use both Synthetic dataset and real data sets to test the traditional cluster methods and our methods.The experimental results show that our algorithm has more advantages,higher efficiency and better quality in finding the trend similarity between time series.
Keywords/Search Tags:Multi-data flow clustering, Time distance, Landmark, Similar trend
PDF Full Text Request
Related items