Font Size: a A A

The Study And Improvements Of Uncertainty-based Sample Selection

Posted on:2011-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:L C DongFull Text:PDF
GTID:2178360308454093Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
It is observed that many datasets contain redundant, noisy and incomplete data. These data take up much storage, and they are useless or even harmful to the learner. Therefore, it is desired to reduce the original data set for obtaining a smaller one, which should preserve or even enhance the performance of the original dataset. This is the main task of sample selection. Sample selection algorithms can be classified into two categories: Data Filter algorithms and Active Learning algorithms. The former are to filter the redundant and noisy data, while the latter are to select useful data from incomplete data, especially from the unlabeled data, which is the focus of this study.This research studies and improves the uncertainty-based sample selection strategy. Because uncertainty-based sample selection strategy usually tends to select isolated points and ignore the samples in dense distribution region, we introduce the impact degree of a sample to avoid information loss. Then we propose the improved strategy that is to select the samples with maximal products of their impacts and uncertainties. Through theoretical proof, the new strategy can reduce the uncertainty of the whole sample pool as much as possible. Finally, the experimental results on artificial and UCI datasets verify that the improved sample selection strategy is better than original selection strategy with respect to the accuracy of the fuzzy decision tree trained by the selected samples.
Keywords/Search Tags:Sample selection, Active learning, Uncertainty, Ambiguity, Fuzzy decision tree
PDF Full Text Request
Related items