Font Size: a A A

Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Improved BP Imputation

Posted on:2016-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:B L WangFull Text:PDF
GTID:2308330464956787Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In practice, incomplete data set is a common problem in clustering analysis. Because of measurement errors, misunderstanding data, missing observations, many data sets suffer from incompleteness. It is quite important to process the missing attributes which are the key factors impacting the clustering performance. The study of missing data clustering analysis is very significant, which has been widespread concern by the domestic and foreign scholars.In view of the problem that the fuzzy c-means(FCM) algorithm is not directly applicable to the case of incomplete data, a FCM clustering algorithm for incomplete data sets based on improved BP to estimate the missing attributes is proposed in this paper. We adopt the nearest-neighbor rule to select training samples for missing attributes. According the position of missing attribute to process the selected training samples, which enables corresponding position of training sample also formed deletion. And training samples constituted by complete samples and incomplete samples. Therefore, we should improve the BP neural network to accommodate incomplete training samples and this paper proposes the BP neural network based on missing data(MBP), then using the optimized training sample set to train the corresponding MBP network for each missing attribute. Then, in the testing process, missing attributes can be estimated, which can recover the incomplete data set. Finally, FCM performs clustering analysis on the recovered data set and the clustering results can be obtained.Missing attributes can be estimated by the MBP network, and estimated values are numerical. But numerical estimates are not suitable for describing uncertainty of missing attribute, and there may be a large deviation compared with the actual value. When missing samples are estimated by the MBP network, we also obtain estimated values of the complete attributes. In this paper, we use the average value, which is calculated by estimation errors of complete attributes, to convert numerical estimates into interval estimates. At the same time, complete attributes are also represented as interval data. So the numerical data set is transformed into interval data set. Then, interval FCM performs clustering analysis on the interval data set and the clustering results can be obtained.Finally, simulation experiments under MATLAB platform with artificial data sets and UCI machine learning data sets Wine, Bupa, and Breast. Experimental results show that the numerical estimates method compared with the comparative methods can obtain more accuracy clustering results, and the interval estimates method is better than the numerical estimates method, and the robustness is better.
Keywords/Search Tags:incomplete data sets, fuzzy clustering, fuzzy c-means, MBP estimate, estimates interval
PDF Full Text Request
Related items