Font Size: a A A

Genetic Programming For Classification Problems

Posted on:2014-02-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:P WangFull Text:PDF
GTID:1228330398964253Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification problem is the most fundamental research area in machine learning. A classification task is to assign items (instances) in a data-set to target categories (class-es) based on classifier(s) learnt by training instances. In binary classification there are only two classes or categories and all instances in the data set will be assigned one of them. The target of a classification problem is trying to design classifiers which make error-free assignments. Mostly, statistical knowledge and techniques are involved into traditional machine learning to design classifiers with high classification performance. In this thesis, we consider the classification as an optimization problem in optimiza-tion perspective. As the relative structure with decision tree, the individual in Genetic Programming is suitbale for classification problems. So, GP is researched in this disser-tation to solve classification problems from the components of Evolutionary Algorithms (EA) to get satisfied classifiers. Specifically, the main contents and contributions of this dissertation are as follows:1. How to encode and decode an individual and how to evalute its fitness value are the key questions of using GP for classification task. In this section, statistical decision tree is invent as the individual for GP, the novel structure helps us de-scribe the exact ditribution of train instances in different sub-spaces splitted by hyperplanes of decision tree. Besides, information gain-based fitness function is designed to evalute each individual in GP, the new fitness function is consistent with the target indicator AUC but has lower computation complexity.2. Designing special local search operator to GP for classification problem. Local search operator plays important role in EA for searching good solutions, however, it depends on the specific problem. In this section, the distribution information of train instances in each sub-spaces is going to help us to design local search operator. The information gain of a sub-space would be improved by splitting it into two or more subsub-space if it contains very confused sample distribution. This local search operator is called splitting operator which trys many times to get suitable splitting hyperplane to split the sub-space to get higher information gam. 3. Multi-objective genetic programming for uncertain classification problems. For uncertain classification problems such as the datasets contain skewed distribu-tions and different misclassfication costs, we still going to search a group of robust soultions. ROC convex hull(ROCCH) is a collection of these solutions which are the potential optimums for a uncertain environment. Moreover, ROC maxization is similar to multi-objective optimization problem. We are trying to maximize tpr and minimize fpr in ROC space, so evolutionary multi-objective algorithm (EMOA) frameworks are combined with genetic programming to get multi-objective genetic programming for the uncertain classficiation problems in this section.4. Though EMOAs work well in ROCCH maximization problem, ROCCH owns its special characters, which makes ROCC maixization problem beyond mutli-objective optimization problem. In this section, we propose two special ideas for multi-objective genetic programming to speed the converge progress and get more efficient classifiers:the first one is convex hull-based without redundancy sorting, the other is area-based selection scheme. Both of them are coming from the concept of ROCCH, which are suitable to contruct the new algorithm-covex hull based multi-objective genetic programming. The new algorithm show its advantages in ROCCH maximization problem.We take down-top research framework in this dissertation. The concept of GP is involved into solving classification, and how to encode an individual and evaluate it is the main research in the first topic. After that, we research on the search operators which are used to make the population converge fast and get good performance. Moreover, we not only consider a single classifier but also a group of classifiers for the robust clas-sification problem because of skewed class distribution and different misclassification costs. Multi-objective optimization techniques are simply involved into search a group of non-dominated solutions to maximize ROCCH, further, special ROCCH character is researched and adopted into design effective multi-objective GP for ROCCH max-imization problem. Real data sets are involved into experiments to evaluate our new ideas and algorithms for all above research topic.
Keywords/Search Tags:Machine Learning, Classification Tasks, Evolutionary Computation, Ge-netic Programming, Fitness Function, Global and Local Search Operators, Multi-objectiveOptimization, Convex Hull Optimization and Theory
PDF Full Text Request
Related items