Font Size: a A A

Research And Application Of Clustering Mining Algorithm Based On Cloud Computing

Posted on:2015-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2298330467472382Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data mining finds the common set of rules which describe the data based on a large number ofcorrelations of facts, and also finds the new model hidden in the data through training and self-study.There are many types of data mining technology, cluster analysis is such a process which classifiesthe data into different classes or clusters referring to the similarity between each data, and objects inthe same cluster are very similar to each other, while objects in different clusters have a highdissimilarity.With the rapid development of modern technology, a variety of business applications bring anew change of data, the explosive growth of the amount of the data makes the traditional datamining algorithms can no longer qualify for today’s data mining tasks. The emerging computingmodel, cloud computing, is an integrated development of distributed processing, parallel processingand grid computing. Cloud computing takes a large number of ordinary hardware to build a clusterof computers, and distributes computing tasks on the cluster to execute them in parallel to achievepowerful computing capabilities. It brings a new approach to process the data from two angles,distributed storage and distributed computing; it is an effective way to handle big data nowadays.This thesis based on cloud computing technology, takes data mining clustering analysis as astarting point, to seek new data clustering mining methods of big data. It aims at the defects ofclassic clustering mining algorithms K-medoids, proposes improved algorithms: SCDK-medoidsalgorithm based on statistical density of center points and RDPK-medoids algorithm based onrelative distance pre-clustering. Then it truly achieves the algorithm’s parallelization by combiningSCDK-medoids algorithm with ideas of meshing, and makes parallel design of the mosttime-consuming part of RDPK-medoids algorithm based on Hadoop platform. Eventually it forms anew implementations scheme for clustering mining algorithm of big data based on cloudcomputing.In order to verify the performance of the proposed algorithms, this thesis designs theparallelization of the original algorithm K-medoids based on Hadoop platform, and the simulationexperiments of fully distributed Hadoop state. Experimental results show that both of theparallelized SCDK-medoids algorithm and the parallelized RDPK-medoids algorithm have betterperformance in clustering accuracy and excuting speed, and they can be applied to the clusteringmining of big data.This thesis also applied the parallelized SCDK-medoids algorithm in the taxi push service toexplain the availability of the proposed algorithm in the community division of social networks andpush services.
Keywords/Search Tags:clustering mining, big data, cloud computing, parallelization
PDF Full Text Request
Related items