Font Size: a A A

Research On Mixed Attribute Clustering Technology Based On Cluster Center Selection Strategy

Posted on:2020-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:R L ZhangFull Text:PDF
GTID:2428330575451696Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The explosive growth of data brings opportunities for the application of data mining technology.Cluster analysis is one of the most active research directions in the field of data mining,which aims to analyze the distribution of data,explore the characteristics of data and discover the potential internal structure of data.It plays an important role in data exploration and machine learning,and is widely used in recommendation systems,customer segmentation,business intelligence,bioinformatics and other fields.Based on the existing clustering algorithms and the cluster center selection technologies,this thesis studies the problems of low clustering accuracy,sensitive and excessive parameters,and deviation of cluster center selection in high-dimensional mixed attribute data clustering algorithm.The contributions of this thesis are described as follows:(1)According to the spatial distribution difference between core objects and non-core objects,we propose a clustering algorithm based on filter model.The algorithm uses the proposed filter model to remove non-core objects that interfere with the clustering process.The adjacency matrix is generated according to the neighbor relationships among the core objects,and the number of clusters is calculated by traversing the matrix.Then,the objects are sorted according to density factor in descending order,and clustering prototypes are selected from them.Finally,the remaining objects are assigned into corresponding clusters according to the partition principle.The effectiveness of our algorithm is demonstrated by experiments on synthetic datasets,UCI datasets and Olivetti face dataset.Compared with similar algorithms,CA-FM has a higher clustering accuracy.(2)According to the spatial distribution characteristics of the cluster centers,we propose a clustering algorithm based on the center selection model(CSC).Based on the proposed parameterless local kernel density and boundary degree,the cluster center selection model CSM is established to automatically determine the cluster centers.According to the partition principle of density peak clustering algorithm,the remaining objects are divided into corresponding clusters to complete the clustering task.The experimental results on the synthetic datasets and the real datasets verify the effectiveness of CSC.Compared with similar algorithms,CSC shows its higher clustering accuracy and robust.(3)For the task of mixed attribute data clustering,a clustering algorithm for mixed data based on Residual Analysis(RA-Clust)is proposed.Algorithm uses entropy weight to measure the similarity between objects with mixed attributes.Based on KNN and Parzen windows technology,we propose a method to calculate the local density of objects,and pre-selected cluster center is conducted by linear regression and residual analysis.Then,the true cluster centers are selected according to cluster center objective optimization model proposed in this thesis.Finally,the remaining objects are assigned according to the principle of minimum distance from high-density objects to complete the clustering task.The experimental results on numeric attribute,categorical attribute and mixed attribute datasets verify the effectiveness of RA-Clust.Compared with similar algorithms,RA-Clust has fewer parameters and higher clustering accuracy.The research of this thesis shows the evolution of clustering analysis technology from low-dimensional,numerical attribute field to high-dimensional,mixed attribute field.At the same time,it proposes a new idea,mechanism or model for selecting cluster centers.Experiments and analysis in medical diagnosis,financial credit,bioinformatics,face recognition and other fields further accelerate the transition of mixed attribute data clustering algorithm from theoretical research to practical application.
Keywords/Search Tags:mixed attribute, clustering center, clustering algorithm, density, high-dimensional data
PDF Full Text Request
Related items