Font Size: a A A

The Research And Implementation Of The Expectation-Maximization Algorithm Based On MapReduce

Posted on:2013-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y JiangFull Text:PDF
GTID:2248330392956664Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of the amount of data in the field, the traditional method indealing with large-scale data set computing the performance bottleneck. As "cloudcomputing" era, MapReduce,a simple model parallel computing goes into the people’ssight, it has simple business logic and separated from the complex implementationdetails and provides a simple and powerful interface, through which the interface canachieve concurrency and distribution of the implementation of large scale computingspontaneous. MapReduce model as a solution can be an effective solution to large-scaledata set computing performance bottleneck. As a very important in a machine learningalgorithm, the expectation maximization algorithm has played an increasingly importantrole in contemporary industrial, commercial and scientific research. The algorithm isported to the cloud platform to break through performance bottlenecks is very meaningful.This article first in-depth analysis of Hadoop and MapReduce model, do someimprovements in the existing algorithms based on. On the second part introduces theprinciple of maximum expected algorithm, and a detailed analysis of the algorithm can beported to the cloud platform reasons, Also introduced the three problems and solutions of thehidden Markov model. The third section presents the expected maximum value algorithm inMapReduce implementations, encoding a hidden Markov model training problem, thesolution to the problem is a special case of the desired maximum value algorithm.Transplant performance of the algorithm to do the test in the fifth part, deployed in a clusterrunning results show that transplanted to the MapReduce computation framework, thealgorithm has a larger increase in handling the volume and processing efficiency.This article is the premise of massive data processing requirements, the EM algorithmwith MapReduce model together, migration to the "cloud computing" platform. And givefull consideration to the future business development needs, to achieve a theoretical modelof the ultra-large-scale data operations. The design combines the idea of "big data" conceptand the “distributed processing ", to ensure high reliability and high accuracy requirements,achieve a complete programming interface, reflecting the better package features.
Keywords/Search Tags:EM algorithm, Hadoop, MapReduce, Hidden Markov models
PDF Full Text Request
Related items