Font Size: a A A

Cost Sensitive Classification Algorithm Based On Progep

Posted on:2016-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhouFull Text:PDF
GTID:2298330467977005Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years, data mining technology is widely used in marketing, business management, corporate crisis management, product manufacturing, Internet and so on. At present, the unused massive amounts of data stored in the world’s computers is growing fast, data types and structures are becoming more complex, which bring serious challenges to reduce mining costs and improve algorithm effectiveness. Therefore, promoting mining algorithm processes and improving algorithm efficiency to obtain satisfactory mining results is signifi-cant.This paper studies and related improves gene expression programming, a new data mining algorithm derived from genetic algorithm, puts forward and designs the ProGEP algorithm, and then uses it to solve the cost-sensitive classification problems, designs and implements the CSC-ProGEP algorithms. This paper mainly to do following jobs on four aspects:1. This paper reviews the status of the current domestic and foreign scholars on GEP algorithm and cost-sensitive learning research; provides an overview of GEP algorithm’s structure and basic processes, sketched several common used CSC algorithm.2. Improved basic GEP algorithm and put forward ProGEP algorithm. Firstly, this paper points out basic GEP algorithm has low assess chromosomes efficiency by go through expression tree repeatly, on the base of current populate improve algorithm "gene read&compute machine", this paper put forword RPE_SD algorithm, by converting gene expression to reverse polish notation and scanning a linear stack structure to read and calculate, the efficiency of chromosome assess is improved; basic GEP algorithm has not given the method of specific constant arguments and completely randomize the initial population, this paper points out the necessity of rational constant parameters and acceleration for evolutionary by insert good individuals into the initial population, this paper puts forward RMLR_AC algorithm, using multiple regression to obtain the variable coefficient of all parameters as constant arguments,an introduces these arguments into the chromosome’s gene expression structure, and realize constant coefficient correction through evolution; In basic GEP evolution, its population has the phenomenon of same individual genotypes, this paper defines concepts of the repeat and hiden repeat chromosomes, and studies the causes of this phenomenon and impacts on the efficiency of genetic diversity, evolution and malignant assimilation of the population’s other individuals, this paper put forward eliminate (hidden) repeat individual (DSC) algorithm, and select again by creating a copy of the population (CPCSC) to improve the GEP selection process; Finally, observe the population characteristics again, this paper pointsout and defines existed phenomenons of same family of chromosomes and race fault, in order to avoid these phenomenons causing gene communication stuck in the whole race and leading the result of evolutionary convergence to the regional optimal solution, this paper puts forward an improved the evolutionary process based on the diversity of population differentiation by periodic thread mechanism (TM_PDI), by sorting the main thread population, cloning after section and supplementing randomized individual(SHS_RRI), this paper gives the algorithm of child thread’s population initialization. Synthesize the basic GEP algorithm and four improvement, this paper puts forward and describes the ProGEP algorithm.3. The paper applys ProGEP to cost-sensitive classification. By building a cost-sensitive matrix into ProGEP fitness function, we can obtain CSC-ProGEP algorithms. On the basis of the description of the algorithm process, this paper presents a method to judge the effect of the rare class classifycation.4. Construction of experimental environment and do experiments. Changes are maked in the basic GEP assess algorithm, selection process, evolutionary process and other aspects, in order to easily describe the algorithmic details and achieve flexible statistical calculation results, the paper based on Microsoft Visual Studio2012, using C#language, object-oriented way to achieve GEP basic model structure and ProGEP improvements. Experiments verify ProGEP algorithm performance and CSC-ProGEP applications effect. In order to observe the promotion of each improvement independently, the experiments add four algorithms into GEP step by step to do repeated treatments and compare results. After verifying the validity of ProGEP, we select five groups of UCI data sets, using10-fold cross-validation method to do CSC tests, comparing the classifier to other (cost-sensitive) classification models:C4.5, BN, BP and AdaCost, the experiments of solving cost-sensitive classification problems show that CSC-ProGEP algorithm obtains higher recall rate and precision in rare class classification with ensuring the accuracy.The significance of this research, on one hand, it improves and enhances the theory of GEP algorithm, improvements of chromosomes assess efficiency, population structure and evolutionary process enrich GEP’s theoretical research; on the other hand, this paper promote the practical application of GEP, CSC-ProGEP mining experiments verifies the ProGEP algorithm have some significance for sickness prediction, fraudulent customers precaution and other rare class mining applications.
Keywords/Search Tags:data mining, cost-sensitive, rare class, gene expressionprogramming
PDF Full Text Request
Related items