Research On Co-Clustering Algorithm Based On Maximizing Modularity

Posted on:2021-04-08

Degree:Master

Type:Thesis

Country:China

Candidate:J H Wei

Full Text:PDF

GTID:2428330623982034

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The goal of co-clustering is to produce a meaningful division of the two-dimensional contingency table,and the rows and columns of the contingency table can be grouped at the same time according to the duality between rows and columns.Compared with traditional one-way clustering,co-clustering can effectively identify subspaces and reveal the implicit relationships between rows and columns.With the rapid development of data science,datasets are becoming more and more abundant.Traditional co-clustering methods have limitations in processing either overlapping data or high-order heterogeneous data.How to better co-clustering such data becomes a meaningful research topic.The modularity is a commonly used standard to measure the quality of community division.It is also a common quality evaluation standard in graph clustering.Based on the summary analysis of the existing co-clustering and modularity,this paper mainly conducts intensive research from two aspects: overlapping co-clustering and hierarchical higher-order co-clustering,and has obtained the following research results:Firstly,considering the limitation that the traditional co-clustering algorithms cannot handle overlapping data and outliers,an Overlapping Co-Clustering algorithm by Maximizing Modularity(OMMCC)is proposed,that is,both row clusters and column clusters are allowed to overlap,and the row and column outliers of the data matrix are not assigned to any cluster.Specifically,a unified framework is designed to add nonexhaustive and overlapping constraints to the objective function.Through using an iterative alternating optimization process to directly maximize the modularity,the overlapping and non-exhaustive Co-clustering can be obtained efficiently.Besides,the parameters of overlapping and non-exhaustive are easy to understand.Secondly,traditional co-clustering methods have certain limitations when clustering high-order heterogeneous data containing multiple feature spaces and multi-types data objects.Besides,most existing co-clustering methods usually generate plane partitions of data with a predetermined number of clusters.To this end,a Hierarchical High-Order Coclustering Algorithm by Maximizing Modularity(MHHCC)is proposed,which iteratively optimizes the objective function based on modularity and finally converges to a unique clustering result.MHHCC merges the information of multiple feature spaces of high-order heterogeneous data.Moreover,MHHCC takes a top-down strategy to perform a greedy divisive procedure,generating a tree-like hierarchical clustering result that reveals the relationship between clusters.Finally,experiments are designed and verified on various synthetic data sets and real data sets.The experiments show that the proposed methods are better than the existing methods.

Keywords/Search Tags:

Co-clustering, Modularity, Overlapping, Higher order heterogeneous data, Hierarchical structure

PDF Full Text Request

Related items

1	Research On The Fast Algorithms Based On The Higher Order Hierarchical Vector Basis Functions
2	A New Hierarchical Clustering Algorithm Study And Application
3	Research On Community Detection Algorithm Based On Hierarchical Clustering
4	Research On Heterogeneous Data Clustering Algorithm
5	Multi-Information Model And Recommendation Technology Research
6	Research On Fuzzy Clustering Algorithm Based On Modularity
7	Research On OLAP Storage Structure Based On Dimensional Hierarchical Clustering
8	Research On Integral Equation Domain Decomposition Method For Solving Electromagnetic Scattering From Electrically Extra Large Objects
9	The Research Of Stable Multi-layer Hierarchical Structure And Its Key Technologies In Ad Hoc Networks
10	Hybrid Clustering Algorithm Based On Hierarchy