Font Size: a A A

Research And Implementation Of Multi-plex Iteration Based On MapReduce

Posted on:2015-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2308330482955602Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Iteration is the act of repeating a process, taking the result of last iteration as the input of next iteration, aiming at approaching a desired goal, target or result. When the result is converged or satisfies other computing terminal conditions, it is considered that the iteration is finished and the last iteration’s result is taken as the final result. Based on the iterative theory, the iteration result is easy to converge to local optimal result, when the iteration computing function does not satisfy Lipschitz condition. In order to solve the problem, it always uses multiple groups of initial input data to do iteration computing, and takes the result, which has the best quality among all iteration results, as the final result. As iteration handles all groups of input data serially, it will consume a large amount of elapsed time. In addition, there is no efficient data sharing between different iteration computing processes. In this case, it’s important to improve iteration computing’s parallel processing capability and data sharing when there are multiple groups of initial input data.MapReduce computing framework has strong parallel processing ability and strong data sharing ability. It also can easily cope with large-scale data set. In this thesis, in order to improve the execution efficiency of iteration computing, we propose the muti-plex iteration based on MapReduce for large-scale data set iteration computing, when there are multiple groups of input data. The main works are as followed:(Ⅰ) In the case of multiple groups of input data, we propose the multi-plex iterative algorithm based on the shortcomings of traditional iterative algorithm. By modifying the execution process of one iteration computing, multi-plex iteration algorithm reduces the iteration running time when there are multiple groups of input data. Meanwhile, multi-plex iterative algorithm enhances the data-sharing ability between different iteration processes and reduces the frequence of reading data set. According to the MapReduce API, we apply multi-plex iteration algorithm on MapReduce for the case of multiple groups of input data.(Ⅱ) We combine the Kmeans algorithm with multi-plex iterative algorithm and propose Mux-Kmeans algorithm. After analyzing the execution process of Kmeans algorithm, we use multi-plex iterative algorithm to improve its computing performance. With the help of Amazon EC2 platform, we make experiments with three real data sets to verify the performance of Mux-Kmeans and calculate its elapsed time. When compared the experiment results between Mux-Kmeans and Kmeans under the same multiple groups of input data, we find that Mux-Kmeans can improve the iteration performance and reduce the elapsed time than Kmeans algorithm.(Ⅲ) We combine the EM algorithm with multi-plex iterative algorithm and propose Mux-EM algorithm. After analyzing the execution process of EM algorithm, we use multi-plex iterative algorithm to improve its computing performance. On a local virtual cloud computing platform, we make experiments with two real data sets to verify the performance of Mux-EM and calculate its elapsed time. When compared the experiment results between Mux-EM and EM under the same multiple groups of input data, we find that Mux-EM can improve the iteration performance and reduce the elapsed time than EM algorithm.The experiments prove that the Multi-plex iteration can improve the efficiency and correctness of iteration computing when there are multiple groups of input data.
Keywords/Search Tags:multi-plex iteration algorithm, iteration, MapReduce, Kmeans, EM, large-scale data set
PDF Full Text Request
Related items