Font Size: a A A

A Research Of Genetic K-Means Algorithm Based On Variable Length Encoding

Posted on:2008-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:G P FanFull Text:PDF
GTID:2178360212485014Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The complex relationships among data cover patterns that people can hardly discover. One of those patterns is to classify objects the data represents. Recently, data mining theory has been developed for a wide range of clustering analysis algorithm to analyze data to uncover the pattern. They all have advantages and disadvantages, and several of them have been put into practice. For example, K-Means algorithm is a simple and effective method, but it has many disadvantages. To avoid these, researchers have raised the improved K-means algorithm.As we know, genetic algorithm (GA) has been proved effective to solve problems like combination optimize and extremes of a function. Its mechanism is fit for improving K-Means clustering as well. Although there are many GA methods being used to improve initial centers selection of clustering, or to learn the best value of K, none of them can perfectly combine both of them together. Being enlightened, a new GA method based on variable-length encoding is put forward for purpose, and received a good effect.The genetic K-Means algorithm (GKA) based on variable-length encoding not only optimized the initial centers, and dynamically learned the value of K, but also recognized some isolated points. So, it reduced the impact of isolated points to K-Means. All this depends on a good mechanism of learning K, and an excellent fitness function. In order to learn K, the individual of best fitness was taken as the example; every new individual generated should get a close length to it. Accordingly, user doesn't need to specify K any more.For a better understanding, the paper emphasize the explaining of basic theory of data mining (DM) and GA. Being expected to put into practice, a prototype system was build up and coded by Java.
Keywords/Search Tags:Data Mining, KDD, GA, variable-length encoding, clustering, K-Means, Business Intelligence
PDF Full Text Request
Related items