Font Size: a A A

Research On The Trends Of Multiple Data Streams Based On Identical-Different-Contrary Analysis

Posted on:2012-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:W P LiFull Text:PDF
GTID:2218330368982948Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the communication and the sensor technology, a large of stream data is created in many application fields such as Real-Time Monitoring and On-Line Analysis Processing(OLAP). The study on the technology of mining data streams is one of the hot topics among the database circle all over the world recently. The distinguishing features, including high speed, continuance and large quantity, of the data streams and the requirements, such as single-pass scanning and real-time response, of the data steams mining algorithms have challenged the mining of the data streams immensely. Trends analysis is a significant part of the data streams mining. It aims at revealing the patterns or trends of the data streams appearing among the process in the streams developing over times. Effective trends analysis can be applied in many fields, such as the state assessment, the early warning and the decision support etc.Apparently, existing researches on the trends analysis of the data streams are almost all concerned about the single data stream only, which tasks include trends description, extraction and forecasting etc. But the researches on the trends analysis of the multiple data streams are hard to find so far. Therefore, the theory of the Set Pair Analysis(SPA) is introduced in this paper to analyze the identical-different-contrary trends of the multiple data streams, which substance is to compare their trends.The tasks of the identical-different-contrary trends analysis about multiple data streams in this paper are composed by three parts. Firstly, the formal descriptions of the peak data and other conceptions are defined. On this basis, the relationship of the trends between the peak data and the original data is studied and the peak data is taken as the basis of the following researches. Secondly, three conceptions about the data streams are given, which, coming from the SPA, are called identity, difference and contrast respectively. Then the data streams are classified into different types of set pair trends. Here, we use the conception of the set pair trends to represent the relationship between two data streams, which is the key of the identical-different-contrary trends analysis about multiple data streams. Finally, the set of the streams which have same set pair trends is defined as set pair trends cluster, and be employed to represent the relationship between multiple data streams.To achieve the tasks described above and meet the requirements of fast processing of data streams, we propose four algorithms. (1) The first algorithm is proposed to get the peak data quickly from the raw data, and it could run in the stage of the data acquisition or the pretreatment. (2) An incremental algorithm is proposed to compute the set pair trends between two data streams by determining the identical-different-contrary relations in a same basic window. (3) The third algorithm is given to compute a special set pair trends cluster whose type of set pair trends is called strong set pair trends and the cluster has maximal data streams. Transforming the computation of this special cluster into the computation of the maximal complete subgraph is the basic idea of this algorithm. We note that the third algorithm is realized by improving an old algorithm called Finding Maximal Complete-Subgrahp(FMCSG) which was used to compute the maximal complete subgraph. The improved algorithm for the FMCSG is not only suitable for settling the graph having numerous nodes but also more efficient. (4) The last algorithm is proposed for computing a special set pair trends cluster in which data streams appear frequently among some windows. The basic idea of this algorithm is to transform the computation of this special frequent cluster into the computation of the common subgraph of a special set of graphs whose nodes are unchanging. The main advantage of the forth algorithm is that it can quickly obtain the results through a single-pass scanning by defining a special operator on the set of graphs.All in all, from the simulation results, we can make the conclution as followings. First, it is appropriate to describe the trends of the multiple data streams by the conceptions of the SPA, including set pair trends and set pair trends cluster etc. Second, all algorithms proposed in this paper are not only effective for the identical-different-contrary trends analysis about multiple data streams, but also capable for meeting the requirements of fast processing of data streams. Last but not unimportant, the trends of the peak data can represent the trends of the original data stream realistically, and the adoption of the peak data reduces the volume of data significantly, which accelerates the dealing with the data streams naturally.
Keywords/Search Tags:Multiple Data Streams, Peak Data, Identical-Different-Contrary Analysis, SPA, Incremental Algorithm
PDF Full Text Request
Related items