Font Size: a A A

Research On Classification Algorithm Of Massive Data Based On Cloud Computing

Posted on:2015-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:H R ZhangFull Text:PDF
GTID:2298330431985577Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet technology, the information age is coming,and the data can be collected are increasing exponentially, which have reached TB or PB level.The traditional data mining technology has been difficult to deal with such large-scale data.Therefore, how to dig out the valuable and useful information from the large-scale data morerapidly and efficiently has become an new challenge.Cloud computing is a computation model for business, which can store and handlelarge-scale data by clusters. The cluster can be set up by a large number of inexpensivecomputers, which greatly reducing the costs. As the powerful storage and computingcapabilities and the advantage of low-cost, data mining technologies for large-scale data havereceive a new opportunity. Hadoop is an open source computing platform, and very suitablefor processing large-scale data sets, and now is widely used. Some of the internal details arepackaged for the programmer and user, which makes it easier to programming and processinglarge-scale data.In this paper, we firstly research some classification algorithms for data mining and thantransplant them onto the Hadoop platform using Map-Reduce programming technology. themain research of our works are as follows.1. To handle the preprocessing of large-scale data, we propose a discrete method forcontinuous data based on Map-Reduce programming model, and depict the design andimplement of the algorithm. Experimental results show that the algorithm has a highefficiency, and suitable for fast discrete the large-scale data.2. As the long time of training and testing for large-scale data set classification on asingle node, we design and implement a parallelized Bayesian classifier based on Hadoopthrough the detailed analysis of the principles of Bayesian classification algorithm and theMap-Reduce programming model. Experimental results show that the parallel algorithm hashigher efficiency and better scalability.3. Considering the integrated classification method of each base classifier for thecontribution of different integration, assign a weight to each base classifier to evaluate theimportance of integrated classification. In the problem of determining weights, usingintelligent differential optimization adaptive evolutionary algorithm of each base classifierweights. Based on this, puts forward a weighted voting ensemble classification algorithm based on differential evolution algorithm. The experimental data show that, the algorithm cannot only improve the classification effect, but also has strong generalization ability.
Keywords/Search Tags:Cloud Computing, Discrete, Na ve Bayes, Differential Evolution, EnsembleLearning
PDF Full Text Request
Related items