Font Size: a A A

Research On Mapredue Of Clustering Algorithms Based On Cloud Computing

Posted on:2013-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:F X HuFull Text:PDF
GTID:2298330467978763Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The research on clustering algorithms has a long history. For decades, the importance and crossover with the other directions of research are affirmed by people. Clustering as a method of unsupervised learning is a common technique for statistical data analysis in many fields, including pattern recognition, data mining, image analysis and machine learning. With rapid growth of Internet data, clustering large-scale data on a single computer will encounter the bottleneck of memory capacity and processing speed of kernel, it is difficult to meet the needs of practical application.Cloud computing is a kind of computing model that takes advantage of Internet to implement the easily on-demand access to the shared resource pool at anytime and anywhere. It is the development of grid computing, parallel computing and distributed computing. It has the ability to process large-scale data.The aim of this thesis is that takes advantage of the ability to process large-scale data of cloud computing platform to solve the problems of large-scale data which are faced by clustering algorithms. This thesis analyses the system the structure of cloud computing, studies the programming model of MapReduce and the distributed file system of HDFS and introduces the related techniques of clustering algorithms. Combining the clustering algorithm of ISODATA with the programming model of MapReduce, this thesis implements ISODATA based on MapReduce. For the drawback of ISODATA, this thesis presents an improved algorithm called WISODATA and implements it based on MapReduce. Through experiments on famous datasets which are selected from UCI machine learning repository, this thesis analyses and compares the accuracy of ISODATA, ISODATA based on MapReduce, WISODATA and WISODATA based on MapReduce. The experimental result demonstrates that their accuracy is high, WISODATA and WIOSDATA based on MapReduce is better than ISODATA and ISODATA based on MapReduce. Experiments on different sizes of datasets demonstrate that ISODATA based on MapReduce and WISODATA based on MapReduce show good performance on speedup, sizeup and scaleup. They are fit to run on cloud computing platform and effective to address large-scale data.
Keywords/Search Tags:cloud computing, clustering algorithm, MapReduce, ISODATA, WISODATA
PDF Full Text Request
Related items