Stochastic and iterative techniques for relational data clustering

Posted on:2010-09-13

Degree:Ph.D

Type:Dissertation

University:University of Maryland, Baltimore County

Candidate:Anthony, Adam P

Full Text:PDF

GTID:1448390002477147

Subject:Computer Science

Abstract/Summary:

This dissertation focuses on the topic of relational data clustering, which is the task of organizing objects into logical groups, or clusters, taking into account the relational links between objects. As a research area, relational clustering has received a great deal of attention recently, because of the large variety of social media applications and other modern relational data sources that have become popular, such as weblogs, protein interaction networks, social networks, and citation graphs. The contributions of this dissertation are in three areas: probabilistic clustering algorithms, iterative clustering algorithms, and multi-relational clustering algorithms.;The probabilistic clustering algorithms are presented as a general framework and allow for the highest level of expression in developing models for relational clustering. Specifically, I show that one particular part of the clustering model---the relation existence distribution---can be modified in a variety of ways. I give two example instantiations of the framework: one that uses objects' distance from their cluster mean to adjust link probabilities, and another that measures the correlation of paired covariate values to determine link probabilities. I demonstrate the properties of each model on both artificial data and two social networks. The results show the potential of the relational probability function in developing useful clustering models.;The iterative algorithm presented in this dissertation, BMOD, trades off model expressiveness for speed and can be applied to much larger data sets, scaling up to several thousand objects. I present empirical results showing that BMOD performs similarly to two related probabilistic models, while also being much faster. The model is applied to a variety of relational data sets including a protein interaction network, a citation graph, and a social network.;The results for the BMOD algorithm led to a multi-relational approach: M-BMOD, which generalizes the clustering model for multiple relation types. The M-BMOD algorithm is shown to perform poorly when low-quality data is present, so an approach for relation selection, or the process of identifying the most relevant relational information out of a larger set of different relation types, is presented. I show that the approach effectively identifies relations that mutually agree on an optimal clustering while also filtering out any relations that will mislead the algorithm.

Keywords/Search Tags:

Clustering, Relational, Iterative

Related items

1	Research On Iterative Clustering Based Community Detection Algorithm And Application
2	Research On Multi-relational Clustering Analysis Approaches
3	Research On Multi-relational Data Clustering Algorithm And Its Application
4	Spectral Analysis And Clustering Of Relational Graphs
5	The Theory And Application Of Multi-Relational Data Mining
6	Research On Results Ranking And Clustering Of Keyword Retrival Over Relational Database
7	Diagnosis Of Learning Adaptability Based On Multi-relational Classification And Clustering
8	A Relational Framework for Clustering and Cluster Validity and the Generalization of the Silhouette Measure
9	Relational data clustering algorithms with biomedical application
10	Research On Clustering-based Relational Similarity Measurement