Font Size: a A A

Research On Co-Clustering Algorithm Based On Maximizing Modularity

Posted on:2021-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:J H WeiFull Text:PDF
GTID:2428330623982034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The goal of co-clustering is to produce a meaningful division of the two-dimensional contingency table,and the rows and columns of the contingency table can be grouped at the same time according to the duality between rows and columns.Compared with traditional one-way clustering,co-clustering can effectively identify subspaces and reveal the implicit relationships between rows and columns.With the rapid development of data science,datasets are becoming more and more abundant.Traditional co-clustering methods have limitations in processing either overlapping data or high-order heterogeneous data.How to better co-clustering such data becomes a meaningful research topic.The modularity is a commonly used standard to measure the quality of community division.It is also a common quality evaluation standard in graph clustering.Based on the summary analysis of the existing co-clustering and modularity,this paper mainly conducts intensive research from two aspects: overlapping co-clustering and hierarchical higher-order co-clustering,and has obtained the following research results:Firstly,considering the limitation that the traditional co-clustering algorithms cannot handle overlapping data and outliers,an Overlapping Co-Clustering algorithm by Maximizing Modularity(OMMCC)is proposed,that is,both row clusters and column clusters are allowed to overlap,and the row and column outliers of the data matrix are not assigned to any cluster.Specifically,a unified framework is designed to add nonexhaustive and overlapping constraints to the objective function.Through using an iterative alternating optimization process to directly maximize the modularity,the overlapping and non-exhaustive Co-clustering can be obtained efficiently.Besides,the parameters of overlapping and non-exhaustive are easy to understand.Secondly,traditional co-clustering methods have certain limitations when clustering high-order heterogeneous data containing multiple feature spaces and multi-types data objects.Besides,most existing co-clustering methods usually generate plane partitions of data with a predetermined number of clusters.To this end,a Hierarchical High-Order Coclustering Algorithm by Maximizing Modularity(MHHCC)is proposed,which iteratively optimizes the objective function based on modularity and finally converges to a unique clustering result.MHHCC merges the information of multiple feature spaces of high-order heterogeneous data.Moreover,MHHCC takes a top-down strategy to perform a greedy divisive procedure,generating a tree-like hierarchical clustering result that reveals the relationship between clusters.Finally,experiments are designed and verified on various synthetic data sets and real data sets.The experiments show that the proposed methods are better than the existing methods.
Keywords/Search Tags:Co-clustering, Modularity, Overlapping, Higher order heterogeneous data, Hierarchical structure
PDF Full Text Request
Related items