Research On The Validity Of Clustering Analysis Methods Based On High-dimensional Data

Posted on:2017-01-07

Degree:Master

Type:Thesis

Country:China

Candidate:W G Lv

Full Text:PDF

GTID:2348330518972321

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

Cluster learning is a branch in machine learning, which has extensive applications in various domains, including financial fraud, medical diagnosis, image processing, information retrieval and bio informatics. Currently various types of clustering models and algorithms have been developed in the literature. With rapid development of information and sampling techno-logies, data structures become complex, including the attribute-type diversity, the high-dimen-sionality, the large scale of the data, the imbalanced distributions and the dynamic characteris-tics of the data. Sincecluster analysis is data-driven, different data characteristics often lead to different clustering models and algorithms. Therefore, when complex data has become the main body of data source in modern society. How to find hidden cluster structures from complex data has become an important research content of cluster learning, which has attract-ed wide attention.In this thesis, we research on how to build the cluster models and design efficient algori-thms for complex data. The main contributions are summarized as follows:1.According to the characteristics of the categorical data a new attribute weighted clustering algorithm is proposed, and the application of the new optimization problem in cluster clustering process.2.We propose a new attribute-weighting clustering algorithm for high dimensional categorical data? The update formulas of the partition matrix, cluster centers, attribute weights in the iterative process are strictly derived, which can guarantee that the proposed algorithm converges to a local solution in a finite iterations.The convergence of the algorithm is also strictly proved.3. Through the useing of UCI in the high-dimensional categorical data simulation, verify the effectiveness of the algorithm and time complexity. The experiment shows the algorithm not only inherits the simplicity of the algorithm proposed by Chan et al, but also solve the failure problem of weighted categorical data.The above mentioned contributions has further enriched the cluster analysis of complex data, and provide technology support for the studies of biological information data, Web data,customer transaction data.

Keywords/Search Tags:

Machine learning, Cluster analysis, Attribute weighting, Partion matrix, Highdimensional categorical data

PDF Full Text Request

Related items

1	Research On Categorical Data Clustering Algorithms
2	Design And Implementation Of Initial Cluster Center Selection Algorithm For Categorical Matrix-object Data
3	Research On Cluster Validity Indices For Categorical Data Clustering
4	Research Of Clustering Algorithms For Categorical Data
5	A Study On Clustering Algorithms For Categorical Data With Applications
6	The Study Of Clustering Data With Categorical Attributes In Data Mining
7	Studies On Clustering Algorithms For Categorical Data
8	Clustering Method And Application Of Stellar Spectrum Based On Attribute Weighting
9	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics
10	The Research Of Na(?)ve Bayes Classification Algorithm Based On Atrribute Reduction And Attribute Weighting