Font Size: a A A

Research On Data Mining And Forecasting Methods Over Time Series Data With Complex Structure

Posted on:2012-03-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:A L QianFull Text:PDF
GTID:1118330368984036Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, the time series data mining technology has made great progress. With the network technology, sensors and other data sensing technology continues to develop, on the one hand, the structure of time series data is becoming more complex, the massive of data is also increasing, on the other hand, there is a growing need for people to found more useful information and knowledge from these more complex data. At the same time, computing power continues strong also make the study of these more complex structures and characteristics of the data out of discovering which information and knowledge possible. For such as time series flow, uncertainty time series, multi time series and more complex structures such as time series data mining techniques, the conventional time-series data mining technology is relatively for most of the simple structure time series data mining. Therefore, the complex structure of time series data mining time series data mining to become the new hot issue, and complexity of the structure makes for a complex structure of the data mining technology faces new challenges.Time-series data stream is one of the most typical data in sensor networks. The stream data is generated all the time, and in a short time a large number of data quickly arrived successively, the amount of data may be infinite. The system does not have the ability to save the whole data, which changes dynamically over time. For the complex structure of data flow time series in wireless sensor networks, mainly assumed to save battery power consumption based on wireless sensor networks, we proposed a Top-k data anomaly detection method PECTMA in wireless sensor network. In particular, we proposed four algorithms, the return of sustained reading detection algorithm CRVMR, Top-k sorting algorithm Top-k-sort, spatial redundancy removal algorithm ESR and Top-k unusual collection algorithm BRCR. The overall idea is to reduce data traffic the sensor nodes need to transmit in order to save battery energy. Through extensive experimental comparison with the well-known anomaly detection methods TAG and TA used in wireless sensor networks, we have showed the effectiveness and efficiency of our method PECTMA.In many practical applications, such as data acquisition by sensors with precision instruments and equipment constraints, the data transfer between different coarse grained data, and privacy protection, the data uncertainty is prevalent. On the one hand, because of the time series data is often the presence of high-dimension features, on the other hand, the uncertainty of the data as the probability of uncertainty, which makes the traditional data management, data representation, storage and indexing, querying, mining and all other technical can not be directly applied to the similarity search of time series data with uncertainty. The uncertainty of the time series can be used for data dimensionality reduction, like the indexing and pruning. Our work study for other theories and technologies, the uncertainty of time-series data for the complexity of the structure, and for the first time we give the formal definition of probability nearest neighbor search over uncertain time series database; the PLA dimensionality reduction over time series of uncertainty. After conversion to the PLA space, we propose made three theorems to accelerate the search efficiency; three theorems based on the proposed time-series data uncertainty to find the probability of K-nearest neighbor with PKNNU, the appropriate searching algorithm PKNNS is also given. A series of experiments are also made to test the effectiveness and efficiency of algorithm PKNNS.Forum network is a typical virtual social networks, social network features, such as network size, a small social community structures, community relations strength, influential nodes, the stability in the community and etc are some important statistics variables, often it is the social network continues to Forum outward manifestation of the dynamic evolution, reflecting the trend of the evolution of public opinion forum, and multivariate time series data is in a forum. Combined with structural analysis and multi-community time-series trend analysis between the association rules, we proposed a public opinion trend predicting method FSTP in the forum. For the first time, the trend of multi-time series between the definitions of association rules is given; established a forum of public opinion time series analysis predicting models; FSTP collection of community structure analysis methods, time series predicting and time series trends with the mining techniques among associated rules, and gives the corresponding FSTPM algorithm. With both real and synthetic data sets, the experimental is made to test the association rule confidence and compared with the well-known algorithms Betweenness, External Optimization and Greedy, to verify the effectiveness and efficiency of our algorithm FSTPM.
Keywords/Search Tags:time series, data stream, uncertain data, multi-time series, data mining, anomaly detection, similarity search, trend forecast
PDF Full Text Request
Related items