Font Size: a A A

Study On The Clustering Ensemble Algorithm Based On Granular Computing

Posted on:2019-03-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:L XuFull Text:PDF
GTID:1368330596456067Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering ensemble is one of the research hotspots of data mining algorithms in recent years.On the basis of ensemble learning theory,it integrates multiple different clustering results either from the same clustering algorithm with different initial parameters or from the different clustering algorithms to generate a final result which has better accuracy and robust than any single clustering result.With the rapid development of information technology,people's access to data has become easier and easier.Due to the roughness,ambiguity,uncertainty of the data itself and the differences in human cognition levels,mining useful knowledge information in massive,complex and high-dimensional data has become more difficult.Granular Computing(GrC),as a new concept and computational paradigm for information processing,provides a set of theories,methods,techniques and tools for dealing with inaccurate and uncertain information.It simulates the ability of humans to analyze and process complex problems from different views and different layers.From coarse layer to fine layer,it chooses the appropriate granularity space to obtain a satisfactory solution by the way of gradually trying.In this paper,based on the idea of granular computing,the clustering ensemble algorithm is optimized and improved from multiple levels and multiple angles which combined with the main theoretical models such as rough sets and fuzzy sets and semi-supervised learning methods.The main research contents include the following aspects:(1)In the clustering ensemble algorithm,the generated base clustering results have some characteristics such as uncertainties,fuzziness and overlap.The accuracy of the final clustering results is easily disturbed by low-quality base cluster members.From the perspective of granular computing,a clustering ensemble selection algorithm based on knowledge granularity is proposed.The similarity among the clustering results is measured by the concept of Granular Distance.The clustering result which share most information with other base cluster members is added as a reference member to the selection set.Then the remaining candidate base clustering results are incrementally chosen and added into the selection set to participate in the final integration process according to the principle that the member who has the maximum difference with cluster members in the selection set and has the greatest similarity with cluster members in the candidate set.In this way,the quality of the base clustering results is ensured meanwhile the difference among them is enlarged,which is beneficial to improve the accuracy of the final result.In the design of the consensus function,according to theory of the dividing ability of knowledge granularity,the method of the elements generation of the co-association matrix is optimized and improved.The obtained sample similarity measure is more consistent with the real data structure.(2)The traditional clustering ensemble selection algorithm usually regards the base clustering result as a whole which uses the evaluation index or weighting strategy to select the base cluster members and ignores the difference among the clusters in the same clustering result.Aiming at this problem,combined with the uncertainty measure method in rough set theory,a double granularity weighted clustering algorithm based on rough fuzzy degree is proposed.Each base clustering result is regarded as a partition of the data set.The uncertainty of each cluster in the partitioning result is measured by the concept of rough set ambiguity,and the evaluation of the reliability of cluster is transformed into an uncertainty measurement problem in rough set.The reliability of each cluster in the entire ensemble member set is obtained under the cluster level.Then in a finer level,a sample local similarity measure is designed which considering the degree of similarity between sample pairs in a same cluster.Finally,a weighted coassociation matrix element generation method based on global cluster reliability and local sample pair similarity is proposed at both the level of cluster and sample.The useful information hidden in the data is further mined out.(3)In order to solve the problem that the data in some fuzzy clustering results is not ideally assigned and there exist a large difference between the clustering result and the real data set distribution,a novel fuzzy clustering ensemble algorithm is proposed based on active full link similarity.Take consideration of the compactness,separation and overlap in the cluster validity evaluation index,an improved fuzzy clustering validity index is designed to select the fuzzy set base cluster members which are much closer to the true structure distribution of the data.By calculating whether each data belongs to a clear cluster in different fuzzy clustering results,the data is divided into clear belonging samples and fuzzy belonging samples.The link-based clustering ensemble algorithm is extended based on the relationship of different clusters which different samples belongs to.Three different methods of full link similarity measurement among samples which has the different degrees of belonging are designed.Finally,the agglomeration hierarchical clustering algorithm and density peaks clustering algorithm based on the new similarity matrix are used to obtain the final clustering results.(4)Combined with the clustering ensemble algorithm based on random subspace,a weighted random subspace clustering ensemble algorithm based on constraint selection is proposed which uses the pairwise supervised information.According to the Dimension Related Distance(DRD),the similarity degree of the pairwise constraint information in the whole space and the subspace is measured.The appropriate constraint information is chosen for different random subspaces to guide the clustering process.This idea not only generates high quality base clusterings but also overcomes the over-fitting problem caused by the constraint information used in the base cluster generation stage.Then the corresponding base clustering is weighted according to both the weight information of different constraint pairs and the quality of base clustering result.Finally,the consensus function is adopted to get the final clustering results.
Keywords/Search Tags:Granular computing, clustering ensemble algorithm, rough set theory, fuzzy clustering ensemble, semi supervised learning
PDF Full Text Request
Related items