Font Size: a A A

Lag Correlation Mining Method For Big Data Stream

Posted on:2016-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:H Z QianFull Text:PDF
GTID:2308330470468720Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Big data stream mining is an important branch of data mining, and increasingly becomes the focus of one of the hot spots, especially with grid computing, cloud computing. Especially the emergence of big data, people needs urgent degree of big data stream processing is higher and higher.This paper mainly introduces the lag correlation in big data stream mining concepts and methods, introduces the traditional method of data stream mining and existing problems, and Research work for the lag correlation in data stream mining includes the following two aspects.First, the paper proposes a lag correlation mining method for big data stream based on series layered sliding window. The method firstly stratify sequence according to series increasing, calculate the coverage g of sliding window on each layer, and then figure up the sliding window parameter values of the sequence on each layer; According to the parameter values of each layer sliding window, calculate the lag correlation coefficient of the sequence, in order to determine the sequence of lag correlation. The experimental results show that the improved method is effective.Second, the paper puts forward a lag correlation mining method for big data stream based on Boolean subtract and series layered. First, with the average value x and y of the original data stream of two sequences, using a flag variable marks on the original data stream sequence, the big data stream sequences are transformed to macro Boolean, which gets conversion sequences. According to the threshold ? to eliminate elements, and reduce residual elements by flag variable。Micro Boolean transformation is determined in accordance with the sampling period T and Boolean sequence values, and eliminate sequence elements, and reduce residual elements. When get the reductive micro Boolean sequences, then using them stratifies by series increasing. Taking advantage of the width g of sliding window and number c of every time sliding windows from each layer calculate the parameter values for each layer sequences; Calculate the sequences lag correlation coefficients in order to determine sequence lag correlations by using the parameter values of each layer sliding window. Experimental results show that the approach can greatly reduce the operation cost, improve the efficiency of operation which guarantees the precision.The growth of the data flow in a way may be unlimited, and different stages have different challenges. In the future work, we focus on the applicability of the algorithm and stability, and do further research to the algorithm error in the data about reduction.
Keywords/Search Tags:Boolean subtract, Big data streams, Sliding window, Lag correlation, Nyquist sampling theorem
PDF Full Text Request
Related items