Font Size: a A A

Reseach On Application And Paralellization Of Decision Tree Algorithm

Posted on:2015-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2308330473451978Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, with the rapid development of information technology, massive data which contain a wealth of knowledge are created among businesses. These data are fortune for those who are good at discovering knowledge, while are data rubbish for those who are not. Different data mining algorithms should be adopted for different types of data, however traditional algorithms can only process limited amount of data. Therefore, we need more suitable data mining algorithms.Cloud computing is a new concept, in computer science field, cloud computing can be considered as network or a new patten to deal with massive data to some extent. It’s a trend of developing massive data mining algorithms based on cloud computing architechture. With the parallel processing capabilities, we can improve the traditional algorithms and transplant the improved algorithms into cloud platform, and then we can handle massive data mining problems easily.Based on the research background above, this thesis will firstly do some research on data mining technology and cloud computing platform. After processing the decision tree algorithm, this thesis focus on the practical application in daliy work, at the same time, by using the L’Hospital rules to improve this algorithm for the calculating performance. Secondly, according to the demanding of mining massive data, this thesis improves the CART algorithm, which can produce simple structured decision tree by using random forest model. As the reason that the random forest model is of less demanding for missing data, attribute category and multi-valued of decision attribute, and so on, applying the CART algorithm to the random forest model can overcome the drawbacks of CART. Lastly, we research on the parallelization of the improved algorithm. By comparing several parallel models, we choose the most suitable MapReduce model to implement the improved CART algorithm. We do experiments to analysis massive data in serial and parallel mode, the results show that the new algorithm achieves better performance in the aspects of effectiveness, algorithm processing speed and acceleration.
Keywords/Search Tags:Cloud Computing, Decision Tree, C4.5, CART, MapReduce
PDF Full Text Request
Related items