Font Size: a A A

Research And Implement On Genetic Programming-based Classification And Clustering Algorithms

Posted on:2011-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:L K YuFull Text:PDF
GTID:2178360302964336Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the improving ability of producing and summarizing data, the desire for an efficient knowledge extraction method from existing database/wide area network has become more urgent. However, currently our capability of analyzing, obtaining knowledge is not compatible with the existing technology of organizing, storing and manufacturing data. To solve this problem, we have raised methods and techniques based on database and data warehouse for data mining. Data mining can be regarded as the result of information technology's natural evolvement from the data gathering and database establishing to data manufacturing and to high-ranked data analysis.Classification and clustering are two momentous processes of data mining, aiming at which researchers have come up with lots of algorithms, such as Bayes Classification, Classification Based on Special Regulations, Associative Classification, K-means Clustering, and Hierarchical Clustering. Users' pertinent knowledge is required in a certain extend in these traditional methods and the input parameters of users greatly affect the result of the algorithm. In order to solve these problems and to ensure the data mining tasks can be completed automatically, researchers put forward a data mining method algorithm based on evolutionary algorithm.Genetic programming was proposed in 1954, and in 1980 Stephen F. Smith published their experimental result which was just ground on genetic programming. Nichael L. Cramer and Jurgen Schmidhuber published papers to issue their modern evolutionary programming in 1985 and 1987 respectively. John R. Koza made a significant extent to it, and his paper pointed out that genetic programming should be regarded as a branch rather than a particular case of genetic algorithm in 1992. John R. Koza is deemed to be the pioneer of genetic programming. Bandyopadhyay and Maulik put forward a improved GA clustering algorithm which was based on clustering center instead of encoding scheme of chromosome in 2002.This paper introduces some related algorithms of classification and clustering in data mining, gives systemic analysis to them and indicates their advantages and disadvantages, also discusses a lot about genetic programming, and meticulously elaborates its correlative theories, methods and technology. Grounded on these all, this paper at last comes up with a new classification and clustering algorithm based on genetic programming. In classification algorithms, a cluster is expressed in a logical formula, which is formed by predicates. Every genetic programming individual encodes this logical formula in tree structure.The clustering algorithm here is based on hierarchy clustering. That is to say, first of all, data set is divided into several clusters; and second, those clusters are amalgamated. Nevertheless in hierarchy clustering, one disadvantage cannot be neglected that once a decision is reached in a certain step it cannot be changed any more. Simultaneously, experiments on data are performed in this paper, and the comparison with traditional clustering algorithm demonstrates considerably improved clustering results.Finally, this paper points out some disadvantages in this algorithm and also proposes the direction of future improvement of this algorithm.
Keywords/Search Tags:Data mining, Classification, Clustering, Evolutionary computation, Genetic programming
PDF Full Text Request
Related items