Font Size: a A A

Research And Application On Hadoop Based Distributed Clustering Algorithm

Posted on:2019-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:D C WuFull Text:PDF
GTID:2428330545471225Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,the amount of data has exploded,and the traditional single point clustering calculation is gradually unable to meet the needs of people.It is meaningful to increase the clustering efficiency in large data environment by distributing large quantities of data to thousands of cluster nodes for distributed computing.This paper mainly studies the implementation and application of clustering algorithm in distributed environment,and uses two-dimensional image data and three-dimensional model data to verify the algorithm and analyze and evaluate the execution efficiency and clustering accuracy of the algorithm.First,the application of distributed clustering algorithm in the field of image classification is studied.For the handwritten digital recognition problem,the handwritten digital image features are large,and the traditional clustering algorithm is less efficient.The traditional K-means clustering algorithm is implemented in the Hadoop distributed computing platform to improve the computing efficiency of the clustering algorithm.In addition,Canopy algorithm is used to optimize the K value of K-means clustering algorithm and the determination of initial clustering center.At the same time,T1 and T2 thresholds in Canopy algorithm are determined by the “maximum minimization principle”,which improves the accuracy of handwritten digit recognition.Secondly,the distributed clustering algorithm proposed in this paper is applied to the field of 3D model retrieval based on view features.In order to extract the features of 3D model,the different views of the 3D model are converted into 2D images and the SIFT features of the 2D images are extracted.The SIFT features are standardized by the word bag model in the field of natural language processing to reduce the influence of noise points on the features.Obtained the view feature vectors of 3D model and applied to 3D model retrieval on Hadoop based distributed K-means clustering algorithm,a good balance is achieved between precision and efficiency.
Keywords/Search Tags:Distributed Computing, Clustering Algorithm, Hadoop, Handwritten digit recognition, 3D model retrieval
PDF Full Text Request
Related items