Font Size: a A A

Large Data Sets Sample Selection Based On Map Reduce

Posted on:2016-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:X H PangFull Text:PDF
GTID:2308330479977635Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of data storage technology, computer network technology and cloud computing technology, the bulk of data is also at stake, big data processing has become a problem to which academia and industry has paid their close attention, it is a new challenge for the traditional data mining algorithms to discovery the useful knowledge from big data. It is very meaningful to investigate the sample selection from large data sets.Based on Map Reduce, this paper propose a sample selection algorithm, which firstly employs the mapping mechanism of Map Recuce to partition the large data sets into some small subsets, and deploy them to different cloud computing nodes. The informative samples are selected in parallel with an instance selection algorithm. And then the Reduce mechanism of Map Recuce is used to collect the selected samples from different cloud computing nodes. Consequently, a selected sample subset is obtained. This process is repeated k times(k is a parameter defined by the user), and k sample subsets are gained. Finally, the voting method is used to select the most informative samples from the k subsets. The ELM classifier is trained with the selected samples, and the testing accuracy is verified on the testing set. The proposed algorithm is experimentally compared with the classic sample selection algorithms; the experimental results show that the proposed algorithm is effective and efficient.
Keywords/Search Tags:Large data sets, Cloud computing, Sample selection, Map Reduce, Extreme learning machine
PDF Full Text Request
Related items