The Research And Implementation Of The Expectation-Maximization Algorithm Based On MapReduce

Posted on:2013-08-09

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Jiang

Full Text:PDF

GTID:2248330392956664

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid growth of the amount of data in the field, the traditional method indealing with large-scale data set computing the performance bottleneck. As "cloudcomputing" era, MapReduce,a simple model parallel computing goes into the people’ssight, it has simple business logic and separated from the complex implementationdetails and provides a simple and powerful interface, through which the interface canachieve concurrency and distribution of the implementation of large scale computingspontaneous. MapReduce model as a solution can be an effective solution to large-scaledata set computing performance bottleneck. As a very important in a machine learningalgorithm, the expectation maximization algorithm has played an increasingly importantrole in contemporary industrial, commercial and scientific research. The algorithm isported to the cloud platform to break through performance bottlenecks is very meaningful.This article first in-depth analysis of Hadoop and MapReduce model, do someimprovements in the existing algorithms based on. On the second part introduces theprinciple of maximum expected algorithm, and a detailed analysis of the algorithm can beported to the cloud platform reasons, Also introduced the three problems and solutions of thehidden Markov model. The third section presents the expected maximum value algorithm inMapReduce implementations, encoding a hidden Markov model training problem, thesolution to the problem is a special case of the desired maximum value algorithm.Transplant performance of the algorithm to do the test in the fifth part, deployed in a clusterrunning results show that transplanted to the MapReduce computation framework, thealgorithm has a larger increase in handling the volume and processing efficiency.This article is the premise of massive data processing requirements, the EM algorithmwith MapReduce model together, migration to the "cloud computing" platform. And givefull consideration to the future business development needs, to achieve a theoretical modelof the ultra-large-scale data operations. The design combines the idea of "big data" conceptand the “distributed processing ", to ensure high reliability and high accuracy requirements,achieve a complete programming interface, reflecting the better package features.

Keywords/Search Tags:

EM algorithm, Hadoop, MapReduce, Hidden Markov models

PDF Full Text Request

Related items

1	Applications in finance of hidden Markov models
2	Recognition, Hidden Markov Model-based And Multi-class Mapping
3	Research On Spectrum Prediction And Spectrum Sensing Technology Based On Hidden Markov Model
4	Estimation of hidden Markov models for partially-observed, risk-sensitive control problems
5	Protein-coding gene structure prediction using generalized hidden Markov models
6	Theory and application of discrete hidden Markov models in plant community ecology and restoration
7	Algebraic Geometry of Hidden Markov and Related Models
8	Process monitoring, diagnostics and prognostics using support vector machines and hidden Markov models
9	Wavelet Based Clutter Suppression Technique Using Hidden Markov Models
10	Research On Real-Time Risk Assessment Methods For Information Systems Based On Hidden Markov Models