Font Size: a A A

Research On Incremental Computing Technologies And Algorithms Based On MapReduce

Posted on:2017-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:P R ZhangFull Text:PDF
GTID:2348330485953716Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
A common property of today's big data processing is that the same computation is often repeated on datasets evolving over time,such as web and social network data.It has also been observed that only a very small fraction of the entire datasets gradually changes in these applications.Therefore,incremental computing is an effective approach to handle this problem.While as a valid tool of distributed computing,MapReduce can efficiently perform the batch computing but cannot support incremental computing directly.In this dissertation,we will conduct our research in incremental computing technologies and algorithms based on MapReduce.The main research contents and contributions are as follows.1.An incremental computing framework based on MapReduceA concise and efficient incremental computing framework based on MapReduce is designed and implemented in this dissertation to solve the general problem.The framework detects repeated portion of the input by their hash signatures.In the Map phase,it only computes the incremental or changed data.In the Merge phase,it removes the output by a deleted portion of the input and combines the new output by new data with the output originally computed to perform the last Reduce computation and obtain final results.The incremental computing framework is built on the top layer of Hadoop platform without any modification to bottom layer.Experimental results show that for some applications the speedup can reach 1.8 compared to completely computing even in the case that the changed data percentage account for 25%of whole data.2.Design of incremental matrix multiplication based on MapReduceMatrix multiplication is widely applied in many domains,such as machine learning,recommended systems and social network.It has been observed that matrix elements often gradually change in these applications.In order to solve the low efficiency problem for matrix multiplication computing under incremental computing framework,a fine-grain level element checking based on hash method is conducted to improve data identification and a matrix multiplication incremental algorithm without any special design or modification to MapReduce framework is proposed in this dissertation.This algorithm can capture the changed elements of the matrix and performs a fine-grain incremental analyzing and processing based on these changed elements.Experimental results indicate that in case of small changing ratio of matrix elements our method can achieve significant improvements and shows promising application prospect compared to completely computing.
Keywords/Search Tags:MapReduce, incremental computing, matrix multiplication, Hadoop
PDF Full Text Request
Related items