Font Size: a A A

Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features

Posted on:2012-10-13Degree:Ph.DType:Thesis
University:University of MichiganCandidate:Muthukrishnan, PradeepFull Text:PDF
GTID:2458390008497786Subject:Computer Science
Abstract/Summary:
Relational data refers to data that contains explicit relations among objects. Nowadays, relational data are universal and have a broad appeal in many different application domains. The problem of estimating similarity between objects is a core requirement for many standard Machine Learning (ML). Natural Language Processing (NLP) and Information Retrieval (IR) problems such as clustering, classification, word sense disambiguation, etc. Traditional machine learning approaches represent the data using simple, concise representations such as feature vectors. While this works very well for homogeneous data, i.e., data with a single feature type such as text, it does not exploit the availability of different feature types fully. For example, scientific publications have text, citations, authorship information, venue information. Each of the features can be used for estimating similarity. Representing such objects has been a key issue in efficient mining (Getoor and Taskar, 2007). In this thesis, we propose natural representations for relational data using multiple, connected layers of graphs; one for each feature type. Also, we propose novel algorithms for estimating similarity using multiple heterogeneous features. Also, we present novel algorithms for tasks like topic detection and music recommendation using the estimated similarity measure. We demonstrate superior performance of the proposed algorithms (root mean squared error of 24.81 on the Yahoo! KDD Music recommendation data set and classification accuracy of 88% on the ACL Anthology Network data set) over many of the state of the art algorithms, such as Latent Semantic Analysis (LSA), Multiple Kernel Learning (MKL) and spectral clustering and baselines on large, standard data sets.
Keywords/Search Tags:Data, Similarity, Using, Feature
Related items