Font Size: a A A

Study On Clustering For Incomplete Data Based On Sample Weighting And Cluster Dispersion

Posted on:2017-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:H W LiuFull Text:PDF
GTID:2348330488459831Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Until now, fuzzy c-means clustering algorithm has been widely used in the field of data mining and pattern classification. However, for incomplete data sets, the traditional clustering algorithm can't be directly applied and it do not take the impact of outliers, the cluster dispersion and the uncertainty of missing values into consideration, which may lead to misclassification.In this section, in view of the impact of outliers, we propose a sample-weighted fuzzy c-means clustering algorithm for incomplete data, in which the weight of samples is taken into consideration. Firstly, the weight of samples is presented by calculating the distance between the samples and the cluster prototype. Then, decide the influence the samples have on the clustering results. The greater the weight is the greater degree of influence the sample has, so as to weaken the impact of outliers.In view of the cluster dispersion and the uncertainty of missing values, we propose an interval-valued fuzzy c-means clustering algorithm based on cluster dispersion. Firstly, find the nearest neighbors of the incomplete samples and represent the missing values by nearest-neighbor intervals. Then, calculate the cluster dispersion of each cluster prototype according to the mean square deviation between the samples and the cluster prototype and take the cluster dispersion into account in clustering process. By using interval-valued data and involving cluster dispersion the misclassification of marginal data objects will be less and the uncertainty of the missing values will be shown.Results on some UCI data sets show that: The first approach can weaken the impact of outliers and make the cluster prototype closer to the real cluster prototype. The second approach can reduce the misclassification of the marginal data objects and make full use of the information of the data sets. Compared with other approaches our approaches are more effective on incomplete data sets.
Keywords/Search Tags:Fuzzy Clustering, Sample Weighting, Cluster Dispersion, Interval
PDF Full Text Request
Related items