Font Size: a A A

Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Local Weighting

Posted on:2018-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2348330512987354Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The data acquisition failure and the influence of random noise and other factors lead to incomplete data,some data loss is a common problem in cluster analysis,which will affect the result of the data clustering.How to deal with these contaminated data,so that the estimated missing data is closer to the real data,the results of fuzzy clustering is more accurate that is the concern of scholars at home and abroad.In this paper,Fuzzy C-mean clustering algorithm can not deal with the problem of incomplete data clustering,and a local weighted Fuzzy C-mean clustering algorithm based on the nearest neighbor principle is proposed.Firstly,the nearest neighbor samples of missing data are found by the similarity function,and then the weights of the nearest neighbor samples are calculated by Gauss kernel function.The distance between samples has a direct effect on the weighting coefficient.The neighborhood structure information of incomplete data has a positive effect on the clustering results.The missing attributes are estimated by the corresponding attributes of the weighted nearest neighbor samples,which can make full use of the attribute distribution of the data.Finally,the fuzzy clustering analysis is performed on the new local weighted data sets,and the clustering results are obtained.The iterative optimization process of Fuzzy C-Mean clustering algorithm for locally weighted incomplete data based on the nearest neighbor principle is guided by the parameter search method of genetic heuristic strategy.The stochastic parallel search capability of the improved genetic algorithm is improved,at the same time the localized weighted incomplete data hybrid clustering algorithm based on the nearest neighbor number,fuzzy parameter and function width parameter encoding is proposed.Autonomous expansion and reduction of the choice of operators to avoid "precocious" at the same time better completion of local search.Under the guidance of autonomous learning mutation operator,the correctness of the genetic optimization algorithm is guaranteed,and the better quality of the chromosome is inherited to the offspring,and the satisfactory clustering results are obtained.In this paper,the two algorithms using UCI standard machine learning database of the four data sets Iris,Bupa,Wine,Breast to simulate experiment.The experimental results show that the local weighting method based on the nearest neighbor principle is used to estimate the incomplete data,and the clustering result of the new complete data set is improved compared with the clustering results of the five contrast methods.The global parallel search ability of the improved genetic algorithm to find the estimation of incomplete data is closer to the original value,which improves the clustering efficiency and obtains more ideal clustering results.
Keywords/Search Tags:Fuzzy C-means clustering, incomplete data, nearest neighbor principle, local weighting method, genetic algorithm
PDF Full Text Request
Related items