Font Size: a A A

Research On New Clustering Analysis Method Under The Framework Of Belief Function Theory

Posted on:2021-05-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:F LiFull Text:PDF
GTID:1488306470970799Subject:Statistics
Abstract/Summary:PDF Full Text Request
Clustering analysis is a traditional statistical method,which is also an impor-tant task in Machine Learning and Pattern Recognition.Based on dissimilarities between a collection of objects,the goal of clustering analysis is to segment ob-jects into clusters,in such a way that objects are similar with each cluster,and dissimilar across different clusters.So far,clustering methods have proved useful in many real-world application domains,such as data mining,image segmentation and so on.According to the clustering output,we can distinguish between hard and soft partitional clusterings;the latter includes fuzzy and evidentail clustering.In par-ticular,evidential clustering,based on Dempster-Shafer(DS)theory(also called Belief function theory),has recently attracted the attention of many researchers Thanks to the generality of Dempster-Shafer theory,evidential clustering can be shown to extend all other soft clustering paradigms.More importantly,unlike the classical probability theory,the core concept of evidence theory is the belief function,which is a non additive measure and has its own advantages when deal-ing with the nonlinear and uncertain problems.For example,in the clustering analysis,some objects belong to different clusters in a certain degree of possibil-ity or trust,instead of undoubtfully belonging to one cluster.In this case,the uncertainty exists in the membership of each object to clusters,which can be ex-pressed by belief functions.Namely,the belief function theory is a powerful tool for uncertainty representation.This research mainly focuses on the new cluster-ing analysis methods under the framework of evidence theory.Specifically,the research contents of this dissertation have four parts as follows(1)Based on the evidential clustering methods,we propose a new constrained evidential clustering method(called k-CEVCLUS method).The EVCLUS method and k-EVCLUS method are evidential procedures for dissimilarity data,based on the assumption that similar objects should be assigned mass functions with low degree of conflict.The constrained evidential clustering(CEVCLUS)method is a version of EVCLUS allowing one to use prior information on cluster membership,in form of pairwise constraints.The original CEVCLUS algorithm is shown to have very good performances,but it was quite slow and limited to small datasets.Based on these,we introduce a new constrained evidential clustering method,called k-CEVCLUS method.The new method improves the objective function in the original CEVCLUS method,which can be decomposed into a quadratic function of the mass function corresponding to each object and be minimized using iterative row-wise quadratic programming.The new method is both several orders of magnitude faster than CEVCLUS and has storage and computational complexity linear in the number of objects,making it applicable to large datasets(around 104 objects).We find that the performances of clustering algorithms(in terms of proximity to the true partition)get better when the number of constraints increases.Thus,we propose a new constraint expansion strategy,yielding drastic improvements in clustering results when only a few constraints are given.(2)Based on Evidence Accumulation Clustering(EAC)method,we study the clustering ensemble in the Dempster-Shafer theory.In general,ensemble methods consist of two principle steps:generating base partitions and combining them into a single one.Just as the classic EAC method,we obtain a group of hard partitions in the first step,then we convert them into an intermediate interpretation,which can be named as relational representation.We believe that the evidence source from the relational representations may be doubtful,which can be fixed by using the discounting process in belief function theory.After discounting the relational representations,we can combine them in evidential level by different combination rules.Then we can obtain the implicability matrix or plausibility matrix from the fused relational representation,which can be seen as a co-association matrix between objects.To make full use of the transitive property between objects,we treat this co-association matrix as a fuzzy relation,and make it transitive to yield a fuzzy equivalence relation.The final partition is obtained by applying some clustering algorithm to the new co-association matrix.Experimental results show the stability and efficiency of our method(3)Starting from fuzzy partitions,we introduce a fuzzy clustering ensemble method based on Dempster-Shafer theory.Most of the existing ensemble methods take hard partitions(obtained by hard clustering methods)as input in the first step.Even a group of partitions is obtained by fuzzy clustering methods(such as fuzzy c-means),they are forced into hard partitions as input,which may lose lots of information.To make full use of the information in fuzzy partitions,we introduce a new fuzzy clustering ensemble method.We directly use the fuzzy partitions from fuzzy clustering methods(such as fuzzy c-means method)as input in the first step,namely fuzzy partitions as base partitions.For each base partition,we represent the "similarities" between objects by mass functions.To fully utilize the relationship information between objects,we introduce two types of relational representations.Based on the relational representation to be considered,we use the combination rules to obtain the fused relational representation.In the second step of our method,we study two ensemble method:(1)the co-matrix between objects is extracted from the fused relational representation,which can be treat as the input of fuzzy c-means method,the obtained fuzzy partition is the final result;(2)we can construct an objective function based on the credal Rand index,which can be optimized by iterative row-wise quadratic programming,the final result is a fuzzy partition.Experiments on datasets show the advantages of our method.(4)Starting from evidential clustering methods,we propose a clustering en-semble method based on Dempster-Shafer Theory.Different from the other eviden-tial clustering ensemble method,the base partitions in the first step of our method are generated by an evidential clustering algorithm(such as Evidential c-means).In Dempster-Shafer Theory,each base partition is expressed by a credal partition,in which the clustering membership of each object is allowed to be uncertain.A credal partition constitutes a richer description of the clustering structure of a dataset,as compared to hard or fuzzy partitions.Instead of directly combining these partitions,we combine them in the credal level,where each credal partition can be expressed by an intermediate representational:relational representation.The one that combines all the relational representations is called fused relational representation.To make full use of the information between objects,an additional operation is taken to make the fused relational representation transitive based on the theory of intuitionistic fuzzy relation,where we can obtain a new relational representation.Finally,the consensus solution is obtained from the new relational representation by minimizing an objective function.Experiments on simulated and real datasets show the advantages of our method.
Keywords/Search Tags:Belief Function, Evidence Theory, Evidential Clustering, Evidential Ensemble Clustering, Relational Representation, Combination Rule
PDF Full Text Request
Related items