Font Size: a A A

Matrix Factorization In The Application Of Data Mining

Posted on:2015-11-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y M LiFull Text:PDF
GTID:1228330467479395Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Matrix factorization has become increasingly popular in many applications that require data mining techniques such as information retrieval, computer vision, and pattern recognition. Matrix factorization aims at approximating the original data matrix in a high dimensional space with the product of two or more matrices in a lower dimensional space. In many applications, data often has different kinds of characteristics:data may have a global geometric structure; data maybe very sparse; training data maybe very limited. Different matrix factorization method should be proposed based on these different characteristics. This paper investigate some hot problems which exist in the process of matrix factorization in application. Based on the existing literature, we propose different kinds of matrix factorization methods to take advantage of different structures in solving the different problems in applications. The major work and contributions lies in several fields as following shows,1. For the problem of data representation, this paper proposes a novel matrix factorization al-gorithm called Coordinate Ranking regularized Nonnegative Matrix Factorization in order to take advantage of the global geometric structure of data. The idea of the proposed algorithm is to com-bine Nonnegative Matrix Factorization and manifold ranking to encode both local and global ge-ometric structures of the data. Experimental results on three real-world datasets demonstrate the superiority of this algorithm.2. For the problem of scientific articles recommendation, this paper presents a novel matrix factorization model, the topic regression Matrix Factorization (tr-MF). The main idea of tr-MF lies in extending the matrix factorization with a probabilistic topic modeling. tr-MF introduces a regression model to regularize user factors through the probabilistic topic modeling under the basic hypothesis that users share the similar preferences if they rate similar sets of items. Consequently, tr-MF provides interpretable latent factors for users and items, and makes accurate predictions for community users. Further, we demonstrate the efficacy of tr-MF on a large subset of the data from CiteULike, a bibliography sharing service dataset. The proposed model outperforms the state-of- the-art matrix factorization models with a significant margin.3. For the problem of relational structure between scientific articles, this paper proposes the topic regression Collective Matrix Factorization (tr-CMF) model which combines tr-MF with the relational matrix factorization. In addition, we also present the Collaborative Topic Regression model with Relational Matrix Factorization (CTR-RMF) model, which combines the existing CTR model and relational matrix factorization. From this point of view, CTR-RMF can be considered as an appropriate baseline for tr-CMF. Further, we demonstrate the efficacy of the proposed models on a large subset of the data from CiteULike, a bibliography sharing service dataset. The proposed models outperform the state-of-the-art matrix factorization models with a significant margin.4. For the problem of noisy tagging with limited training samples, this paper proposes a discriminative model, called Semi-parametric regularized Support Vector Machine with Multi-label Constraint (SpSVM-MC). which exploits both labeled and unlabeled data through a semi-parametric regularization and takes advantage of the multi-label constraints into the optimization. The main idea of semi-parametric regularization lies in capturing the geometric structure through covariance matrix factorization in high-dimensional space. While SpSVM-MC is a general method for learning with limited and noisy tagging, in the evaluations we focus on the specific application of noisy image tagging with limited labeled training samples on a benchmark dataset. Theoretical analysis and extensive evaluations in comparison with state-of-the-art literature demonstrate that SpSVM-MC outstands with a superior performance.
Keywords/Search Tags:Coordinate Ranking, Non-negative Matrix Factorization, Matrix Factorization, Prob-abilistic Topic Modeling, Recommender System
PDF Full Text Request
Related items