Font Size: a A A

Research And Implementation Of Clustering And Neural Network Algorithm Based On Cloud Computing Platform Hadoop

Posted on:2017-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:S S LiuFull Text:PDF
GTID:2348330503488802Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of modern science and technology, it promotes the rapid spread and popularity of the Internet. The size of network applications rapidly expandes and the data of network applications increases by the explosive growth. It contributes to the birth and development of cloud computing technology. Hadoop of the open source cloud platform is emerged with the advent of the era of big data. Data analysis has become an important business decision support. How to effectively mining valuable information from massive data is of great significance. Clustering analysis and neural network analysis are some of the core technologies of data mining. The computer performance and programming model constrained the traditional data mining technology. The traditional single algorithms have been unable to meet the processing needs of massive information, the cloud computing technology development provides a new research direction for clusterig and neural network analysis as cloud mining.This paper firstly studies it deployments clusters for Hadoop, achieves clustering algorithm parallelization using MapReduce model. Because of the various clustering algorithms, this paper studies the k-means clustering algorithm. The improved algorithm is applied in Hadoop platform, experiments show that the parallel algorithm using MapReduce model could greatly improve the speed of text clustering in processing the Wine dataset in UCI database.This paper then studies it deployments Spark clusters on Yarn. It designs and achieves neural network algorithm parallelization based on Spark platform. This paper studies the BP algorithm.By scheduling tasks to achieve parallelism, job scheduling through DAGScheduler, TaskScheduler and so on is divided into different Stage according to the DAG. Each Stage is divided into a set of concurrent execution of Task(ShuffleMapTask and Result Task). Spark introduces RDD data model based on the working set and computing mode based on memory, applied for iteration computations.The improved algorithm is applied in Hadoop platform, experiments show that the parallel BP algorithm could greatly improve the speed of text clustering in processing the Kddcup dataset for classification.
Keywords/Search Tags:Hadoop, MapReduce parallelization, clustering, Spark, neural network
PDF Full Text Request
Related items