Font Size: a A A

Research On Attribute Weighted And Incomplete Data Fuzzy Clustering Approaches

Posted on:2012-08-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:D LiFull Text:PDF
GTID:1118330368485908Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Fuzzy clustering is one of the research focuses in the field of pattern recognition. It is mainly used to identify the internal structure of data. Similarity metric is a key problem in fuzzy clustering. However, the existing methods for similarity metric, such as Euclidean distance and Hamming distance, have certain limitations since they assumed implicitly that each attribute of the sample has equal contribution to the clustering performance. Moreover, in most cases, attribute values of samples might be missing because of the limitations in data collection, random noise and some other reasons. But most of the existing clustering algorithms may not be directly applicable to such incomplete samples. Aiming at the aforementioned problems, this dissertation concentrates on the attribute weighted and incomplete data fuzzy clustering approaches. The main contributions of the research can be summarized as follows:1. For attribute weighted clustering, a fuzzy clustering algorithm with interval-supervised attribute weights is presented, which can enhance the rationality of attribute weights and improve the clustering performance. Firstly, from the viewpoint of cognition and information complexity of datasets, attribute weights are represented as intervals in clustering analysis, which can be obtained by interval analytic hierarchy process to describe the different contribution of attributes, as a result, it improves the robustness of attribute weight representation compared with numerical attribute weights; Secondly, attribute weights, memberships and cluster prototypes can be obtained by iterative optimization. If any calculated weight in certain iteration is out of its interval-constrained range, it will be forced to the corresponding interval center for further iterations. And a maximum number of iterations is set to ensure the convergence of the algorithm. Experimental results show that the proposed algorithm can avoid the local minima, and can achieve better clustering performance than the existing algorithms.2. For incomplete data fuzzy clustering, two algorithms are presented based on nearest-neighbor intervals. Firstly, concerning the uncertainty of missing attributes, missing attributes are represented by nearest-neighbor intervals according to the nearest-neighbor information of the incomplete sample; secondly, based on the nearest-neighbor interval representation of missing attributes, two algorithms are proposed in this dissertation. The first approach is to transform the incomplete dataset into an interval-valued one, and then to perform clustering analysis by using the existing clustering algorithms for the interval-valued dataset. Since the cluster prototypes are convex hyperpolyhedrons in the attribute space, which can present the shape of the clusters to some degree, more accurate clustering results can be achieved. Because the missing attributes can be limited to appropriate ranges by the interval representation, the second approach hybridizes fuzzy c-means and genetic algorithm to solve the incomplete data clustering problem. Genetic algorithm is involved to search for optimal imputations of missing attributes in the corresponding nearest-neighbor intervals, and then fuzzy c-means is used to obtain compact clusters on the "completed" dataset. Therefore, more satisfying clustering results can be obtained on the basis of the appropriate imputations of missing attributes.3. In most of the existing algorithms, they seldom concern the problem that different attributes may contribute differently to the clustering. Aiming at this disadvantage, an attribute weighted fuzzy clustering algorithm for incomplete data is proposed. Firstly, comparatively accurate imputations of missing attributes and classification labels are obtained by an existing algorithm; Secondly, each attribute of the "completed" dataset is evaluated by the ReliefF algorithm; Finally, the attribute weights are combined into fuzzy clustering by weighted Euclidean distance, so the missing attributes and clustering results can be obtained simultaneously. Experimental results of the simulation show that the algorithm can achieve better clustering performance on incomplete datasets by emphasizing the contribution of important attributes.
Keywords/Search Tags:Fuzzy Clustering, Attribute Weighting, Incomplete Data, Interval
PDF Full Text Request
Related items