Greedy EM Algorithm Based On Map Reduce Framework

Posted on:2019-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Q Cao

Full Text:PDF

GTID:2428330545991401

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the research of big data,the analysis and processing technology of big data has became an upsurge of research.This has also prompted the emergence of big data processing platforms and big data processing algorithms for various large-scale distributed systems.The MapReduce distributed processing framework has been used as a big data analysis.The mainstream technology that has been dealt with has risen rapidly and has been solved by adopting a “divide and conquer” approach to a complex issue.There is no doubt that the MapReduce distributed framework provides a fast and effective means for the current big data processing problems.Cluster analysis is important in the field of machine learning and data mining.In the field of cluster analysis,greedy EM algorithm is also a very practical and important algorithm.However,in today's society,the amount of information is increasing rapidly,and data transmission and communication are also in the rapidly expanding realistic context,when storing large amounts of data,existing methods cannot load these data into memory at one time.In addition,the traditional greedy EM algorithm can no longer use the traditional single-machine serial iterative method to process data.This causes the convergence speed of the algorithm to drastically slow down as the amount of data increases.In order to solve the problem that the convergence rate of a greedy EM algorithm is drastically slow when processing large-scale data sets,the MapReduce distributed framework idea is used to distribute the greedy EM algorithm and proposes a Greedy EM algorithm based on MapReduce.This algorithm adopts the greedy algorithm strategy,and mainly obtains the intermediate value and the final value through two stages of Mapper and Reducer.Specifically,the Mapper stage implements data distribution,processes each node and generates corresponding key-value pairs,and then uses the Reducer stage to integrate the generated key-value pairs,and finally obtains an optimal Gaussian mixture model satisfying convergence conditions.At the same time,the model component number of the Gaussian mixture model is also obtained.Finally,through three groups of experimental results,it is proved that without pre-specifying the number of initial model components and accurately obtaining the number of model components,the algorithm can greatly improve the convergence speed when dealing with large data sets,and it has good robustness,and it has good robustness and algorithm scalability.

Keywords/Search Tags:

greedy EM algorithm, machine learning, data mining, MapReduce framework

PDF Full Text Request

Related items

1	Study On The MapReduce Framework For Genetic Algorithm Based Distributed Data Mining
2	MapReduce-based Parallel Data Mining Services For TCM
3	Research And Application Of Clustering Mining Algorithm Oriented Big Data Based On MapReduce
4	Research Of MapReduce Data Skew And Task Scheduling In Heterogeneous Environments
5	Research On Data Preprocessing Framework Based On Machine Learning
6	Research On A Parallel Data Mining Algorithm Apriori
7	Design And Research Of Monitoring And Inspection System For Intelligence Warehousing
8	Research On Abnormal Transaction Data Analysis Of Blockchain Based On Machine Learning
9	Research On Intrusion Detection Classification Algorithm Based On Multi-Greedy
10	A Novel Top-k Query Algorithm Based On MapReduce Framework