Font Size: a A A

The Research Of Partition Clustering Based On Comprehensive Measurement

Posted on:2012-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhangFull Text:PDF
GTID:2218330338470611Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and database system, mass data is collected and stored in the database. But there is no powerful tool for us to understand the potential knowledge, it results in mass data is produced and poor information. Therefore, scholars have proposed data mining technology.In data mining area, as a branch of statistics, clustering analysis has been widely studied for years, mainly concentrated in the based on distance of clustering analysis, the research has focused on serving a large database of effective clustering analysis to find the appropriate methods. Active research focus on the clustering method for complex scalability, shape and type data clustering validity, the high-dimensional clustering technique, and most large database of mixed numerical and classified data clustering method.This paper introduces in detail the data mining technology, including the data mining technology to the definition and research content, task and function. And on this basis, the data mining of clustering analysis is analyzed in detail, mainly from clustering analysis of data structure and data types, the main clustering algorithm, and the classification of the commonly used clustering algorithm based on division.Key research classified attribute data of K-Modes and K-Prototypes two clustering algorithm. For K-Modes algorithm, mainly discusses K-Modes about two objects in algorithm based on distance between objects in dissimilarity measure formula, by adding a weight coefficient, this variable representation between two objects based on the potential correlation, divided, and on the basis of redefining the dissimilarity measure formula; For K-Prototypes algorithm, mainly discusses the K-Prototypes algorithm about clustering of initial values of selection problem, by pressing the frequency decomposition method, added two control variables, then the original algorithm was improved. Experiments show that the improved algorithm is the original algorithm, cluster quality with a certain degree of improvement.
Keywords/Search Tags:data mining, clustering, classified data, K-Modes, K-Prototypes
PDF Full Text Request
Related items