Font Size: a A A

Pattern extraction and clustering for high-dimensional discrete data

Posted on:2014-03-14Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Jiang, PengFull Text:PDF
GTID:1458390005986328Subject:Computer Science
Abstract/Summary:
We explore connections of low-rank matrix factorizations with interesting problems in data mining and machine learning. We propose a framework for solving several low-rank matrix factorization problems, including binary matrix factorization, constrained binary matrix factorization, weighted constrained binary matrix factorization, densest k-subgraph, and orthogonal nonnegative matrix factorization. These combinatorial problems are NP-hard. Our goal is to develop effective approximation algorithms with good theoretical properties and apply them to solve various real application problems. We reformulate each of the problems as a special clustering problem that has the same optimal solution as the corresponding original problem. Making use of this property, we develop clustering algorithms to solve corresponding low-rank matrix factorization problems. We prove that most of our clustering algorithms have constant approximation ratios, which is a highly desirable property for NP-hard problems. We apply the proposed algorithms and compare them with existing methods for real applications in pattern extraction, document clustering, transaction data mining, recommender systems, bicluster discovery in gene expression data, social network mining, and image representation.
Keywords/Search Tags:Data, Clustering, Matrix factorization, Mining
Related items