The Research Of Partition Clustering Based On Comprehensive Measurement

Posted on:2012-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Zhang

Full Text:PDF

GTID:2218330338470611

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet and database system, mass data is collected and stored in the database. But there is no powerful tool for us to understand the potential knowledge, it results in mass data is produced and poor information. Therefore, scholars have proposed data mining technology.In data mining area, as a branch of statistics, clustering analysis has been widely studied for years, mainly concentrated in the based on distance of clustering analysis, the research has focused on serving a large database of effective clustering analysis to find the appropriate methods. Active research focus on the clustering method for complex scalability, shape and type data clustering validity, the high-dimensional clustering technique, and most large database of mixed numerical and classified data clustering method.This paper introduces in detail the data mining technology, including the data mining technology to the definition and research content, task and function. And on this basis, the data mining of clustering analysis is analyzed in detail, mainly from clustering analysis of data structure and data types, the main clustering algorithm, and the classification of the commonly used clustering algorithm based on division.Key research classified attribute data of K-Modes and K-Prototypes two clustering algorithm. For K-Modes algorithm, mainly discusses K-Modes about two objects in algorithm based on distance between objects in dissimilarity measure formula, by adding a weight coefficient, this variable representation between two objects based on the potential correlation, divided, and on the basis of redefining the dissimilarity measure formula; For K-Prototypes algorithm, mainly discusses the K-Prototypes algorithm about clustering of initial values of selection problem, by pressing the frequency decomposition method, added two control variables, then the original algorithm was improved. Experiments show that the improved algorithm is the original algorithm, cluster quality with a certain degree of improvement.

Keywords/Search Tags:

data mining, clustering, classified data, K-Modes, K-Prototypes

PDF Full Text Request

Related items

1	A Study Of The Clustering Algorithm For Mixed Data
2	Determination Of Optimal Clustering Number Of Mixed Data And Its Application
3	Research On Biomedical Data Clustering
4	Research On Partitional Clustering Algorithms For Mixed Data
5	Research And Application Of Clustering Analysis Algorithm Based On Mixed Attribute Data
6	Research On Ensemble Clustering Algorithms For Complex Data
7	Research On Partitioning Clustering Algorithms For Data With Mixed Numerical And Categorical Attributes
8	Recommendation System Of Commercial Site Research Based On Web Data Mining
9	Research And Design Of Parallel K-prototypes Clustering Algorithm Based On Hadoop
10	Research On Data Mining Algorithms Of Information Security Classified Protection Evaluation