Font Size: a A A

Classification Of Big Data Based On MapReduce And Restricted Boltzmann Machine

Posted on:2019-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:T T WangFull Text:PDF
GTID:2428330569979282Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the big data era,many challenges have been introduced into the traditional learning machine,in machine learning field,the MapReduce parallel programming model of Hadoop which is an open source framework is usually used to handle big data.Many machine learning algorithms can be paralleled by MapReduce,such as decision trees,K-means and restricted Boltzmann machine(RBM),etc.Taking RBM as example,the parallelization of RBM by MapReduce is mainly to deal with two problems: the training of RBM is inefficient,and the training set is too big to be loaded into memory.The research on big data classification based on open source framework has important theoretical significance and application value.Based on MapReduce and RBM,the problem of big data classification was studied in this thesis,main works include two parts:(1)With respect to the strengths and weaknesses of parallelization mechanism of MapReduce,we did a comparative study,and obtained some valuable conclusions.(2)On the basis of the previous work,an algorithm for big data classification by integrating RBM was proposed.The proposed algorithm roughly includes two steps,in the first step,multiple RBMs are trained parallelly by MapReduce,in other words,multiple RBMs are trained concurrently on multiple nodes of cloud computing platform.In the second step,the trained RBMs are integrated by fuzzy integral and used to classify unseen data.The proposed algorithm is compared with the related methods,the experimental results show that the proposed method is effective and efficient.
Keywords/Search Tags:Big data, Open source framework, Data classification, Fuzzy integral, Ensemble classifiers
PDF Full Text Request
Related items