Classification And Clustering Gene Expression Programming

Posted on:2015-01-24

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zhou

Full Text:PDF

GTID:2268330425496033

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As part of methods of data mining, classification and clustering have always been a researchhotspot. This is because that they can be application models for many real-life problems.Recently, researches on classification and clustering have gradually changed from a longitudinalstudy to cross-sectional study, which means that they are fused with other algorithms. Thus,people could make use of advantages and features that belong to other algorithms. In thebackground of the time of the big data, classification and clustering problems would catch moreand more attention. As they say birds of a feather flock together.Gene Expression Programming (GEP) is proposed by Ferreira. It originally brought theadvantages of both genetic programming and genetic algorithm together. Making GEP ownsextensive search capability and infinite variability. Applications of GEP involve biology,mathematics, computer applications, and physics and so on.The classification and clustering based on GEP is the combination of GEP and the ruleclassification of data mining or GEP and K-means, make use of the characteristic of evolutionand global search ability to research the rule classification and K-means, the combination cancompensates each other and bring out new ideas. The main work of this paper includes thefollowing aspects:1. GEP was used to mine out rules, so that the using rules could be classified. Firstly a newpattern of chromosome terminal symbol was designed and the ratio of using rules correctly istaken as the measurement of fitness function, aiming at the problem of rules classification. Thenthe fitness is sorted by descending order and an alternative set of rules are established. By meansof using GEP to mine out classification rules in the Monk and Acute Inflammations data sets andmaking use of these rules to classify data sets, the outcome of experiments on these two data setscan be seen that the new method has higher accuracy than those traditional ones when miningdata sets.2. The auto-clustering algorithm based on gene expression programming (GEP-Cluster) is anew algorithm proposed in recent years. GEP-Cluster combines GEP with clustering and isapplied widely. The K-means clustering algorithm based on gene expression programming(GEP-KC) is improved method of GEP-Cluster. Firstly, the GEP-KC is based on GEP-Clusterand improves chromosome encoding and decoding to avoid invalid chromosomes. Second, theoptimal number of clusters selection algorithm is added to GEP-KC and quotes iterativerelocation which come from K-means, so that the number of clusters get more accurate, intuitiveand effective to improve the effect of clustering. Finally, through clustering150two-dimensionalpoints, the experiments show that GEP-KC has higher accuracy and faster convergence.

Keywords/Search Tags:

Rule-based classification, GEP, Mining rules, Alternative sets of rules, K-means, the optimal number of clusters, iterative relocation

PDF Full Text Request

Related items

1	The Study For Mining Classification Rules Based On Genetic Algorithms
2	Research Of Medical Image Classification Approach Based On Rough Sets And Association Rule
3	The Study On Approaches Of Mining Classification Rules Based On Rough Sets Theory And Intelligent Computing
4	Association Rule Mining Algorithm
5	Classification Association Rule Induction Algorithm And Applied Research
6	Research And Application Of Time Series Association Rules Based On Fuzzy Set
7	Research Andapplication On Determining Optimal Number Of Clusters In Cluster Analysis
8	Research On Determining Optimal Number Of Clusters In Cluster Analysis
9	The Study Of Business Rules Mining Modeling And Application
10	Research On Generation Of Extended Fractal Patterns Based On Rules