Font Size: a A A

The Research And Implementation Of Clustering And Convex Hull Algorithms Based On MapReduce Framework

Posted on:2014-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhaoFull Text:PDF
GTID:2248330398994140Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rise of the passage of time and the development of technology andMobile Internet, networking and other new technologies, the annual data aregeometric growth. Compared with the traditional data, big data trend towards thefollowing four features: Volume、Velocity、Value and Variety. The traditional dataprocessing technology could not adapt the need of large data storage, managementand processing tasks. So it has become the new direction of the data miningtechnology how to dredged valuable information from huge amounts of data in a morerapid, efficient and cost-effective way to help making decision in the companies.However, the emergence of cloud computing has brought new opportunities fordata mining technology. Hadoop is one of the many cloud computing platforms and isan Apache open source project now. Hadoop is composed by Hadoop DistributedFilesystem and MapReduce programming framework. The design concepts of themare from the two papers about the technologies of Google Distributed Filesystem andMapReduce programming model on the Google Company. These two technologiestake full advantage of the conputing power and the disk storage ability. Hadoop coulduse a lot of common computer to deal with the large-scale data by the clusters. So itcould solve the problems faced in the analysis and process of the huge amounts ofdata by combining Hadoop cloud computing platforms with data mining algorithm. Atthe same time, it can be an effective solution to improve the data processing abilitiesand reduce the hardware requirements.This paper studies how to use Hadoop Cluster parallel computing capability tomake the data mining algorithms come true. Firstly, this paper makes the research ofthe growth and value creation of big data for illustrating the need of efficiencyimprovement of data mining and makes the introduction about big data techniques and technologies and tools. Then this paper makes the research about the operatio nmechanism of Hadoop Distributed File System, stored procedures and error handing.It also introduces the programming model of MapReduce framework and theoperating principle. Secondly, this paper uses Hadoop, an open-sourceimplementation of Google’s distributed file system and the MapReduce frameworkfor distributed data processing, on modestly-sized compute clusters to evaluate itsefficacy for standard machine learning tasks. We show benchmark performance onsearching and sorting tasks to investigate the effects of various system configurations.Then this paper provides the completion of K-means clustering algorithm by usingMapReduce programming model. Last, the computer graphics convex hull algorithmhas been completed by Hadoop and this paper makes the simulation using thiscompletion of convex hull and combining with K-means clustering algorithm.We cancome to the conclusion that the algorithm of Convex hull could be applied to the DataMining algorithms based on MapReduce framework. Last, this paper make theintroduction based on the result of Clustering algorithm on data compression.
Keywords/Search Tags:Hadoop, Clustering, Convex hull, MapReduce
PDF Full Text Request
Related items