Font Size: a A A

Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Interval Analysis

Posted on:2017-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:B X LiFull Text:PDF
GTID:2308330482999730Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the current situation of the rapid development of information society, the amount of data information increase rapidly. Due to human, environmental and other factors, there is many losing phenomena of data information which made incomplete data. How to effectively analyze and deal with missing data, for further analysis of the entire data set and mining is crucial. Therefore, the study of clustering analysis of incomplete data is of great practical significance and value.Aiming to missing data attribute uncertainty and standard fuzzy c-means algorithm is not suitable for direct processing of incomplete data, we proposed the fuzzy clustering algorithm for incomplete data based on missing attribute interval size(MIS-FCM). First, finding the nearby points of missing data according to the nearest neighbor rule. Completing the interval imputation of missing attribute according to the value range of its nearby points. Interval value estimation significantly improve the rationality of missing data value. Then, made a structural transformation for the entire incomplete data set, in which the missing attribute is replaced by the interval median and interval size as the control property and discussed the impact of the control parameters of interval size in clustering. Finally, the standard fuzzy c-means algorithm is used to finish the cluster analysis of the processed incomplete data.Directed to the uncertainty of missing data attribute, the analysis of attributes weight become difficult. Existing incomplete data fuzzy clustering algorithm does not effectively analyze the effect of data attribute weights in clustering. This paper presented a interval fuzzy clustering algorithm for incomplete data based on weighting coefficient of variation. During the interval estimation of missing data attribute values, the entire incomplete data set can be transformed to interval data set. Then using mathematical statistics the coefficient of variation calculate the weight of interval endpoints which resolve the problems of incomplete data attribute weights analysis. Finally, the resulting value of interval weight combined the interval valued fuzzy c-means clustering algorithm to finish the clustering analysis of the processed incomplete data, which also got better clustering results.At last, the paper uses four standard data set of UCI machine learning library Iris, Bupa, Wine and Breast conduct the algorithm validation in MATLAB simulation platform. Simulation results show that the two algorithms presented in this paper on the basis of lack of interval data attribute analysis, we studied the interval size for the missing attributes affecting the accuracy of the clustering results and incomplete data sets of different characteristic attributes weights clustering effects have received a better clustering results.
Keywords/Search Tags:incomplete data, fuzzy clustering, fuzzy C-means, interval size, weight
PDF Full Text Request
Related items