Font Size: a A A

Reseach On Incremental Processing Base On MapReduce

Posted on:2015-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2308330482956300Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the diversity of data access, the scale of dataset that data mining and machine learning use become more and more large. As time goes on, on the basis of large datasets, the new data is also constantly added, and at the same time, the existing records in the dataset may be modified or deleted, namely the incremental change of dataset, inducing to the result of the last time mining is out of data. When incremental changes of dataset occurs, we need to mining the whole dataset to obtain real-time results, wasting a large amount of computing resources. In order to improve the efficiency, using incremental processing technology to solve data incremental changes is an effective method. Incremental processing technology is used to handle the changing part of the dataset, which computes the incremental data.MapReduce is a popular framework of parallel processing large data and is the popular big data processing tools due to its usability. But the MapReduce framework does not support incremental processing, so the total dataset needs to be reprocessed to get the real-time results once the incremental data occurs.In this thesis, we propose incremental processing technology, which extends the MapReduce framework, named incr-MapReduce model. The main contributions of the thesis are as follows:(1) We propose incremental calculation method on key-value level, and design a new file model MRBGraph, saving the state of granular computing state. Incr-MapReduce matching records from the MRBGraph file according to the incremental data, then do incremental computation and update the MRBGraph file.(2) We propose incremental processing technology incr-Mapreduce calculation model.it is not only supports batch algorithm for incremental calculation, but also support Incremental iterative calculation. When incr-MapReduce proceed the incremental calculation of the iterative algo rithm,it does incremental iteration from the last calculated convergence result and using the change control technology at the same time which can effectively control the scope of records of the next iterated incremental calculation.(3) Because when incr-MapReduce does the incremental calculation, MRBStore needs to read data from MRBGraph file and update the MRBGraph file frequently.This thesis uses the index and buffer optimization techniques, which reduce the I/O time of operating the MRBGraph file.The thesis executes PageRank、GIMV、KMeans and Apriori algorithm on the four frameworks of incr-MapReduce、MapReduce、Haloop and iMapReduce with real data sets to show the performance of incr-MapReduce and meanwhile we use the experiment to verify the optimizing strategy of incr-MapReduce.
Keywords/Search Tags:MapReduce, MRBGraph, Iterative algorithm, Incremental processing, Incremental iterative computation
PDF Full Text Request
Related items