Font Size: a A A

The Study Of Application On Improved Genetic Algorithm For Sample Selection

Posted on:2007-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZhaoFull Text:PDF
GTID:2178360182999420Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As machine learning aims to address larger, more complex tasks, the problem of focusing on the most relevant information in a potentially overwhelming quantity of data has become increasingly important. Data becomes increasingly larger in both number of features and number of samples in many applications such as genome projects, text mining, and business intelligence. This trend poses a severe challenge to machine learning algorithms.In particular, feature selection removes irrelevant features, increases efficiency of learning tasks, improves learning performance, and enhances comprehensibility of learned results. Feature selection has proven to be an effective means when dealing with large dimensionality with many irrelevant features. Although there exist numerous feature selection algorithms, new challenging research issues arise for feature selection: from handling a large dimensionality huge number of samples.When the number of samples is large, it is sensible for one to use a portion of data to achieve the original objective without performance deterioration. Random sampling is a common approach to this problem. However, random sampling is blind because it does not exploit any data characteristic. In this dissertation, we propose improved genetic algorithm and utilize it to search sample space for classification and evaluation with the best representative subset of training set. Given the same feature subset, the proposed genetic algorithm utilizes fewer samples and obtains greater accuracy than random sampling. This dissertation mainly contributes to the following two aspects:1. Aiming at finding best representative subset of training set only including relevant samples, genetic algorithm improves its gene representation, cross manner, and mutation manner. The ratio of right predictive number of positive sample to wrong predictive number of positive sample is used as the fitness of the improved genetic algorithm, because the positive sample is dramatically small with high unbalanced distribution in practical binary classification.2. The sample selection model based on improved genetic algorithm is put forward, which is combined with evolutionary local selection algorithm, and applied to predict potential customers. The proposed sample selection model is tested on UCI dataset, which proves the efficiency of the model.
Keywords/Search Tags:Sample selection, Sampling, Feature selection, Genetic algorithms
PDF Full Text Request
Related items