Font Size: a A A

Study On Incomplete Data Clustering Method Based On Correlation Of Sample Neighbors

Posted on:2020-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:J Y CaoFull Text:PDF
GTID:2428330590997014Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
Fuzzy C-means clustering has been widely used in the fields of pattern recognition and image processing.In real life,due to some reasons such as data omissions and data acquisition restrictions,the data sets obtained usually contain a large amount of incomplete data.However,the traditional clustering method can not be directly applied to datasets with incomplete data.The treatment of missing attributes also directly affects the clustering results.Therefore,from the perspective of neighbors correlation of samples,this paper proposes two incomplete data clustering methods.The main research contents include:Aiming at the defect that the basic fuzzy C clustering algorithm has equal division trend on the data set,based on the mutual influence value between the samples and the class proportion of the sample neighbors,a spatial distance based on the neighboring sample generics is proposed.The generic information of the neighbor samples around the sample points is introduced into the original Euclidean distance in a proportional manner,and the sample distribution information is used to achieve the purpose of making the distance measurement process adjust according to the data set change,based on the distance between sample points,a clustering effect value is constructed to introduce clustering objective function.An incomplete data fuzzy C-means clustering method based on sample space distance is proposed.The experimental results show that the proposed algorithm considers the spatial distribution characteristics of the sample in the distance calculation to obtain more accurate clustering results of incomplete data.Based on the mutual influence value between samples,an incomplete data clustering method based on sample neighbor membership weighting is proposed.The weighted membership degree of the sample neighbors is used to correct the membership degree of the sample itself,so that the membership of the sample itself is adjusted by the weighted average of the membership of its neighbor samples.In order to make full use of the distribution information of the sample,the weighting coefficient used is the Gaussian kernel function in the similarity function,so that the sample distribution in the neighborhood of the sample points can affect the similarity between the sample points to improve the clustering effect of the incomplete data set.
Keywords/Search Tags:Neighbors Correlation, Incomplete Data, Fuzzy C-means, Clustering
PDF Full Text Request
Related items