Large-Scale Positive And Unlabeled Learning

Posted on:2018-12-22

Degree:Master

Type:Thesis

Country:China

Candidate:P Gao

Full Text:PDF

GTID:2348330512498176

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Machine learning methods based on positive and unlabeled samples are called PU learning.PU learning is widely used in practical applications.For example,enterprise will discover new customers according to the existing customers which are considered as positive data,which is also called "Lookalike".PU learning can be divided into two categories according to different application scenarios.One category is called PU classification,the other is called PU matrix completion.The first one mainly builds model for a specific task,such as "Lookalike" for a product.The second one mainly builds model for the relationship of two sets of entities,such as one-class collaborative filtering and multi-label learning.In many cases,in addition to the relationship matrix between entities,there are some auxiliary feature information we can get,such as user or product features in one-class collaborative filtering,in this case we can use the PU inductive matrix completion algorithm to get a better result.Existing PU learning algorithms are all implemented on stand-alone machine.However,in big data era,practical machine learning algorithms should have the a-bility to be distributed.This article designs and implements the distributed version of existing PU learning methods on spark.In addition,inspired from multi-task learning,we propose a new model which is called cluster PU inductive matrix completion.This article includes the following three contributions:1.We implement the distributed PU classification algorithms,including distribut-ed two-step methods and distributed cost sensitive methods.Based on the big data set of Lookalike task,we compare all the methods.Moreover,these algorithms has a certain degree of scalability.2.We implement the distributed PU inductive matrix completion algorithm,and conduct experiments on the benchmark data sets of recommendation system and multi-label learning.We find that the algorithm's scalability is very competitive.3.We propose a new method called cluster PU inductive matrix completion and design a distributed learning method for it.Based on the benchmark data sets of recom-mendation system and multi-tag learning,we compare our method with the state-of-art PU inductive matrix learning method.We find that our method has a better AUC,along with the competitive scalability.

Keywords/Search Tags:

PU Learning, Classification Algorithm, Matrix Completion, Cluster Al-gorithm, Spark

PDF Full Text Request

Related items

1	Some Applications Of Matrix Completion Totext Classification
2	Research On Algorithm And Application Of Matrix Completion
3	Research On The Application Of Binary Matrix Completion In Personalized Learning
4	Weather Monitoring Data Processing Based On Matrix Completion
5	The Study Of Matrix Completion Algorithms And Image Recovery
6	Low-Rank Tensor (Matrix) Completion Algorithm With Applications
7	Research On Recommendation Alogorithm Of Clustering-based Low Rank Matrix Completion
8	Research On The Classification Algorithm Of Unbalance Data Based On Spark
9	Research On Matrix Completion Based Recommend Algorithms
10	Denoising Method On Low Rank Matrix Completion And Matrix Recovery Occluded Images