Font Size: a A A

Classification And Clustering Gene Expression Programming

Posted on:2015-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhouFull Text:PDF
GTID:2268330425496033Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As part of methods of data mining, classification and clustering have always been a researchhotspot. This is because that they can be application models for many real-life problems.Recently, researches on classification and clustering have gradually changed from a longitudinalstudy to cross-sectional study, which means that they are fused with other algorithms. Thus,people could make use of advantages and features that belong to other algorithms. In thebackground of the time of the big data, classification and clustering problems would catch moreand more attention. As they say birds of a feather flock together.Gene Expression Programming (GEP) is proposed by Ferreira. It originally brought theadvantages of both genetic programming and genetic algorithm together. Making GEP ownsextensive search capability and infinite variability. Applications of GEP involve biology,mathematics, computer applications, and physics and so on.The classification and clustering based on GEP is the combination of GEP and the ruleclassification of data mining or GEP and K-means, make use of the characteristic of evolutionand global search ability to research the rule classification and K-means, the combination cancompensates each other and bring out new ideas. The main work of this paper includes thefollowing aspects:1. GEP was used to mine out rules, so that the using rules could be classified. Firstly a newpattern of chromosome terminal symbol was designed and the ratio of using rules correctly istaken as the measurement of fitness function, aiming at the problem of rules classification. Thenthe fitness is sorted by descending order and an alternative set of rules are established. By meansof using GEP to mine out classification rules in the Monk and Acute Inflammations data sets andmaking use of these rules to classify data sets, the outcome of experiments on these two data setscan be seen that the new method has higher accuracy than those traditional ones when miningdata sets.2. The auto-clustering algorithm based on gene expression programming (GEP-Cluster) is anew algorithm proposed in recent years. GEP-Cluster combines GEP with clustering and isapplied widely. The K-means clustering algorithm based on gene expression programming(GEP-KC) is improved method of GEP-Cluster. Firstly, the GEP-KC is based on GEP-Clusterand improves chromosome encoding and decoding to avoid invalid chromosomes. Second, theoptimal number of clusters selection algorithm is added to GEP-KC and quotes iterativerelocation which come from K-means, so that the number of clusters get more accurate, intuitiveand effective to improve the effect of clustering. Finally, through clustering150two-dimensionalpoints, the experiments show that GEP-KC has higher accuracy and faster convergence.
Keywords/Search Tags:Rule-based classification, GEP, Mining rules, Alternative sets of rules, K-means, the optimal number of clusters, iterative relocation
PDF Full Text Request
Related items