Font Size: a A A

A Study Of Probability Generation Model Based Similarity Modeling Techniques And Its Applications

Posted on:2014-02-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:H P MaFull Text:PDF
GTID:1228330398964269Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of current internet, huge amounts of resources can be utilized to improve human’s life. However, it also causes the problem of information overload. How to deal with the information overload problem has been becoming one of the focused topics in academia and industry. To solve this problem, different data mining techniques were widely studied and applied recent-ly. Along this line, two most important methods based on similarity modeling were proposed:(1) classification and management of resources based on content similarity;(2) user similarity modeling based on online behaviors that provides personalized services. However, there are still several important challenges should be addressed for improving the existing work:how to model the inner structure among variables, and how to eliminate the negative effect of sparse data in a high-dimensional space. Consequently, we launch a study on the topic of simi-larity modeling techniques and its applications based on probability generation model. The contributions could be summarized as follows:Firstly, we propose a generative model to capture correlations among mul-tiple labels, and design a multi-label document classification algorithm. Recent years have witnessed a considerable surge of interest in the multi-label learning problem. It has been shown that a key factor for a successful multi-label learn-ing algorithm is to effectively exploit relations between labels. However, most of the previous work exploiting label relations focuses on pairwise relations. To handle the situations where there are intrinsic correlations among multiple labels, we apply the proposed model L-F-L-PAM for inferring the training data and the standard Four-Level Pachinko Allocation model for the test data. Furthermore, we propose a pruned Gibbs Sampling algorithm in the test stage to reduce the inference time. Finally, extensive experiments have been performed to validate the effectiveness and efficiency of our new approach in label ranking performance. The results demonstrate significant improvements of our model over Labeled L-DA (L-LDA), and superioriority in terms of both effectiveness and computational efficiency over other high-performing multi-label learning methods. Secondly, we propose a generative model based method for mobile user sim-ilarity mining with respect to their habits. Recently, the progressing ability of sensing user contexts of smart mobile devices makes it possible to discover mobile users with similar habits by mining their habits from their context-rich device logs. However, though some researchers have proposed effective methods for min-ing user habits such as behavior pattern mining, how to leverage the mined results for user similarity mining remains less explored. To this end, we propose a novel approach for conquering the sparseness of behavior pattern space and thus make it possible to discover similar mobile users with respect to their habits by leverag-ing behavior pattern mining. To be specific, first, we normalize the raw context log of each user by transforming the location-based context data and user inter-action records to more general representations. Second, we take advantage of a constraint-based Bayesian Matrix Factorization generative model for extracting the latent common habits among behavior patterns and then transforming be-havior pattern vectors to the vectors of mined common habits which are in a much more dense space. The extensive experiments conducted on real data sets show that our approach outperforms three baselines in terms of the effectiveness of discovering similar mobile users with respect to their habits.Last, we design a context-aware App recommendation algorithm, which in-tegrates information of users’ similarity with respect to their habits and Apps’ category similarity. Our previous work has shown that users’context-aware be-havior habit and Apps’ category information may provide very useful information for App recommendation, since the two kind of information might make better understanding of users’ preferences. Along this line, we propose a matrix fac-torization framework, to seamlessly integrate the users’ habit information and Apps’category information into the collaborative filtering procedure. Experi-mental results conducted on real data sets demonstrate that our approach can achieve better recommendation performance than other baselines.
Keywords/Search Tags:Probability Generation Model, Similarity Modeling, Multi-Label Clas-sification, Collaborative Filtering, Recommendation Algorithm, Text Classifica-tion
PDF Full Text Request
Related items