Font Size: a A A

The Study Of Clustering Data With Categorical Attributes In Data Mining

Posted on:2004-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhaoFull Text:PDF
GTID:2168360122960341Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the development of databases and Internet, the volume in data collection and storage in the recent decades grows explosively. How to analyze, explore and discover useful knowledge rapidly and efficiently from these data becomes the focus of scientists. To deal with this challenge, data mining technology has been studied and applied to many industries. The clustering analysis derived from statistic has become an active area in the research on data mining. In the paper, the technology of data mining is introduced in detail, including the aim and functionalities of data mining, the process of data mining, the usual tools and systems, the main applications and trends in data mining. Subsequently, clustering analysis in data mining is disserted, involving the methods and characteristics of clustering used in data mining and the methods for evaluating the clustering results, with emphasis on clustering the data with categorical attributes. K-modes clustering algorithm and its variations are introduced with their advantages and disadvantages. The clustering result of fuzzy k-modes algorithm is contrasted to the class structure of original data, and the definition of the present clustering accuracy and its computing methods are amended. On the basis of the partition similarity, a new definition for the accuracy of fuzzy k-modes algorithm is presented. At last, the fuzzy k-modes clustering algorithm based on the attributes weighted is presented for the different contribution of each attribute of the data set to the clustering. With a new fitness, the evolutionary strategy is used to optimize the weight matrix and the clustering accuracy based on the partition similarity is used to evaluate the clustering result. The experiment gives a better result with the soybean disease data set as the input samples.
Keywords/Search Tags:Data Mining, Clustering Analysis, Categorical Attribute, Evolutionary Strategy
PDF Full Text Request
Related items