Font Size: a A A

Research Of Dimensionality Reduction And Similarity Matching For Uncertain Time Series

Posted on:2015-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:R XiaoFull Text:PDF
GTID:2268330425482212Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Time series is a sequence of real numbers in chronological order. It reflects the characteristics of entity attributes in time sequence. The reduction and Similarity Matching of time series is is an important research area in data mining field, and it is widely used in LBS, environmental monitoring, networking and other fields. However, With the development of information technology and real-world applications demand continues to expand gradually emerged in a special kind of data, that is the uncertainty of the data, in the application of wireless sensor networks, radio frequency identification (RFID) network, tracking moving objects, weather radar network and Privacy. Uncertain time series is a series that owes many observation points.The research of the past pays more attention to reduction, similar matching and data indexing. There are not works about the basic properties of time similarity, In order to make up for the deficiency in the field we put forward the basic nature about commutative, Transitive and distributive of time series similarity. And then, we raise the concept of the optimal set about series similarity, this is the ultimate goal in the research of time series similarity and aggregation. Lastly we prove it is a NP-Complete problem that seeking the optimal similarity set about series. It provides the theory basis to seeking heuristic algorithm for clustering of time series.Due to the length of uncertain time series and the values in each sample point is uncertain. So the dimension reduction is the primary task to match fast for uncertain series. Now, we always take wavelet transform to reduce dimension for uncertain time series, But the method do not consider the correlation between every sample points. We put forward a new method based on statistics and data correlation. It divides uncertain time series to probability dimension and time dimension and performs dimension reduction respectively in the two dimensions. We use a sampling point represents the subsequent sampling points with high correlation in time dimension, and use large probability point represents the adjacent small probability points in probability dimension. Experimental results show that the compression ratio is remarkable when we use the method to reduce uncertain time series. In addition, we can approximately recover the uncertain time series with reduced outcomes.Similarly, due to the curse of dimensionality and large possible worlds, it leads to high Time complexity when matching two uncertain time series. Time complexity is very high when matching two uncertain time series. To solve the problem, we put forward new methods based on arithmetic coding and trend of time series. The Similarity matching algorithm based on bucket division and arithmetic code can reduce uncertain time series to certain time series and match time series by distance. It also can approximately recover the uncertain time series by reduced outcomes. At last, the experiment proves that the algorithm owns low time cost and high matching efficiency. In addition, it can match two new types. The similarity measure based on trend of time sereis maps time series to short trend symbol series, then it introduce connectivity index and Mitani coefficient to realize similarity measure for time series. Experiment proves it can accomplish similarity measure effectively and own low time cost. The cluster method based on trend of time series judgment of the trend of trend symbol iteratively by defining height of trend symbol, until reduce to one trend symbol, then we cluster time series to a class with identical trend symbol. Experiment proves it can accomplish cluster effectively and own low time cost, in addition, we can use five trend symbols connectivity index to unique represent a time series.
Keywords/Search Tags:time series, uncertainty, dimensionality reduction, matching, cluster
PDF Full Text Request
Related items