Font Size: a A A

PU Problem Classification Algorithm Based On Support Vector Machine

Posted on:2020-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y J YuanFull Text:PDF
GTID:2428330572991887Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
The PU(Positive and Unlabcled)problem is a classification problem defined on a data set that contains a small number of positive sample points and a large number of unlabeled sample points.It is a special case of semi-supervised problem in machine learning.In many applications of machine learning(e.g.text classification,gene sequence,image recognition,etc.),it takes a lot of time and labor to obtain a large number of labeled samples,so solving the PU problem is of great significance.Support Vector Machine(SVM)has significant theoretical and practical advantages and is widely used in machine learning.There are Biased-SVM(B-SVM)and One-Class SVMs to solve PU problems.Based on the existing SVM models,B-SVM regards all unlabeled data points as negative examples.The classifier is established by giving a larger weight to the positive classifier and a smaller weight to the negative classifier,and the numerical experiments show that the classification result of B-SVM is good.Nonparallel support vector machine(NPSVM)is an extension of support vector machine.It not only has the advantages of support vector machine,but also great advantages in dealing with intersected data sets and large-scale data sets.In this thesis,based on B-SVM and NPSVM,we propose l1-NPSVM and apply it to the PU problem.This method has some feature selection functions.By transforming it into linear programming form,the solving process is simple and efficient.The numerical experiments show that the algorithm has better classification result.Absolute value inequality SVM uses an absolute value inequality to deal with unla-beled points in general semi-supervised problems,and establishes classifiers by reasonably distributing unlabeled points on both sides of the classification hyperplane.This method is easy to implement and has a good classification result.In this thesis,by calculating the distance between unlabeled points and labeled positive points,we select some unlabeled points as negative points,then transform the PU problem into a general semi-supervised problem,and use absolute value inequality SVM to solve the transformed problem.The numerical experiments show that the algorithm is simple and feasible,and the classifica-tion result is good.
Keywords/Search Tags:PU problem, Support vector machine, Feature selection, Nonparallel sup-port vector machine, Absolute value inequality
PDF Full Text Request
Related items