Research And Implementation Of Clustering And Neural Network Algorithm Based On Cloud Computing Platform Hadoop

Posted on:2017-08-13

Degree:Master

Type:Thesis

Country:China

Candidate:S S Liu

Full Text:PDF

GTID:2348330503488802

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of modern science and technology, it promotes the rapid spread and popularity of the Internet. The size of network applications rapidly expandes and the data of network applications increases by the explosive growth. It contributes to the birth and development of cloud computing technology. Hadoop of the open source cloud platform is emerged with the advent of the era of big data. Data analysis has become an important business decision support. How to effectively mining valuable information from massive data is of great significance. Clustering analysis and neural network analysis are some of the core technologies of data mining. The computer performance and programming model constrained the traditional data mining technology. The traditional single algorithms have been unable to meet the processing needs of massive information, the cloud computing technology development provides a new research direction for clusterig and neural network analysis as cloud mining.This paper firstly studies it deployments clusters for Hadoop, achieves clustering algorithm parallelization using MapReduce model. Because of the various clustering algorithms, this paper studies the k-means clustering algorithm. The improved algorithm is applied in Hadoop platform, experiments show that the parallel algorithm using MapReduce model could greatly improve the speed of text clustering in processing the Wine dataset in UCI database.This paper then studies it deployments Spark clusters on Yarn. It designs and achieves neural network algorithm parallelization based on Spark platform. This paper studies the BP algorithm.By scheduling tasks to achieve parallelism, job scheduling through DAGScheduler, TaskScheduler and so on is divided into different Stage according to the DAG. Each Stage is divided into a set of concurrent execution of Task(ShuffleMapTask and Result Task). Spark introduces RDD data model based on the working set and computing mode based on memory, applied for iteration computations.The improved algorithm is applied in Hadoop platform, experiments show that the parallel BP algorithm could greatly improve the speed of text clustering in processing the Kddcup dataset for classification.

Keywords/Search Tags:

Hadoop, MapReduce parallelization, clustering, Spark, neural network

PDF Full Text Request

Related items

1	Research On Parallelization Of Clustering Algorithm Based On MapReduce
2	The Research On The Improvement And Parallelization Of CLIQUE Algorithm In Hadoop Environment
3	Research And Application Of Parallelization Optimization Of Spatial Clustering Algorithm Based On Spark
4	Research On Parallel Clustering Algorithm For Large - Scale Data Set
5	Research On Parallelization Of BP Neural Network Based On MapReduce
6	The Design And Implementation Of Parallelization Of Canopy And FCM Clustering Algorithms On Spark Platform
7	Research On Parallel Clustering Algorithm For Streaming Data
8	The Parallelization And Optimization Of K-means Algorithm Based On Spark
9	Research On Parallelization Of Text Clustering Based On Hadoop Cloud Computing Platform
10	Research And Application Of Clustering Parallel Strategy For Affinity Propagation