Font Size: a A A

Research On The Validity Of Clustering Analysis Methods Based On High-dimensional Data

Posted on:2017-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:W G LvFull Text:PDF
GTID:2348330518972321Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Cluster learning is a branch in machine learning, which has extensive applications in various domains, including financial fraud, medical diagnosis, image processing, information retrieval and bio informatics. Currently various types of clustering models and algorithms have been developed in the literature. With rapid development of information and sampling techno-logies, data structures become complex, including the attribute-type diversity, the high-dimen-sionality, the large scale of the data, the imbalanced distributions and the dynamic characteris-tics of the data. Sincecluster analysis is data-driven, different data characteristics often lead to different clustering models and algorithms. Therefore, when complex data has become the main body of data source in modern society. How to find hidden cluster structures from complex data has become an important research content of cluster learning, which has attract-ed wide attention.In this thesis, we research on how to build the cluster models and design efficient algori-thms for complex data. The main contributions are summarized as follows:1.According to the characteristics of the categorical data a new attribute weighted clustering algorithm is proposed, and the application of the new optimization problem in cluster clustering process.2.We propose a new attribute-weighting clustering algorithm for high dimensional categorical data? The update formulas of the partition matrix, cluster centers, attribute weights in the iterative process are strictly derived, which can guarantee that the proposed algorithm converges to a local solution in a finite iterations.The convergence of the algorithm is also strictly proved.3. Through the useing of UCI in the high-dimensional categorical data simulation, verify the effectiveness of the algorithm and time complexity. The experiment shows the algorithm not only inherits the simplicity of the algorithm proposed by Chan et al, but also solve the failure problem of weighted categorical data.The above mentioned contributions has further enriched the cluster analysis of complex data, and provide technology support for the studies of biological information data, Web data,customer transaction data.
Keywords/Search Tags:Machine learning, Cluster analysis, Attribute weighting, Partion matrix, Highdimensional categorical data
PDF Full Text Request
Related items