Font Size: a A A

Research On Semi-Supervised Learning Algorithms Based On Bayesian Method

Posted on:2020-07-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:B B JiangFull Text:PDF
GTID:1368330575966587Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,there are a large number of unlabeled samples and a small number of labeled samples in real-world applications.Although labeled samples can effectively improve the performance of supervised learn-ing,it is expensive and time-consuming to obtain enough labeled samples.In this case,the generalization ability of supervised learning is not strong using only a limited num-ber of labeled samples,and unsupervised learning using unlabeled samples entirely is often ineffective.These traditional machine learning paradigms not only have poor per-formance but also waste data.Therefore,it is of great significance to study a machine learning paradigm that is able to exploit both labeled and unlabeled samples.Semi-supervised learning can improve learning performance by using available unlabeled samples when labeled samples are inadequate,which has attracted extensive attention in recent years.With more than 20 years of research,semi-supervised learning has become an important machine learning paradigm and has been successfully applied in many fields.However,there are still some important problems,for example,the effec-tive utilization of unlabeled samples,the efficient use of abundant unlabeled samples,and its effectiveness in feature selection,which are unsolved in semi-supervised learn-ing.In this dissertation,we will investigate these issues,and the main contributions are summarized as follows:(1)Research on the effective utilization of unlabeled samples.This dissertation pro-poses a semi-supervised learning framework through sparse Bayesian,and two algorithms SBS2LEM and SBS2LVB based on this framework.By using Bayesian inference technique,JBS2LEM and JDJ2LVB have good sparsity and make more effective utilization of unlabeled samples through automatically removing the ir-relevant ones during the training process.Experimental results demonstrate that SBS2LEM and SBS2LVB can make better use of unlabeled samples to improve performance.It is important to point that SBS2LEM and SBS2LVB can achieve better performance than other graph-based semi-supervised learning algorithms when the useful information provided by unlabeled samples is very limited.(2)Research on the efficient use of large amounts of unlabeled samples.This dissertation proposes the scalable semi-supervised learning algorithms,namely SBS2LLP and ISBS2L,based on sparse Bayesian.ISBS2L firstly decomposes the type-? marginal likelihood of SBS2LLP into two parts that are dependent and independent on the current unlabeled sample,thereby quantifying the contribu-tion of each unlabeled sample.Then,ISBS2L employs an incremental strategy to sequentially select the unlabeled sample that contributes most to the marginal likelihood instead of using all available unlabeled samples directly.Therefore,ISBS2L has lower time complexity and thus is capable of coping with large-scale data sets.This dissertation theoretically verifies the reliability of the proposed algorithms by analyzing their robustness and generalized error bound.Experi-mental results show that SBS2LLp and ISBS2L can achieve highly competitive performance on the standard data sets in comparison with the counterpart semi-supervised algorithms,and that ISBS2L can work efficiently for multi-million scale data with better performance and scalability.(3)Research on the effectiveness of semi-supervised learning in feature selection.This dissertation proposes a joint semi-supervised feature selection and classifi-cation algorithm,namely JSFS,through Bayesian method.By associating each unlabeled sample with a self-adjusting weight parameter,JSFS can automatically select useful unlabeled samples and eliminate the irrelevant ones,which avoids the indiscriminative use of unlabeled samples and improves the robustness against unlabeled noise samples.Moreover,JSFS can adaptively identify the relevant features and simultaneously learn a classifier with the selected features,which breaks the limitations that determining the number of selected features in advance and adopting additional learning algorithms to train a classifier in the existing al-gorithms.Experimental results show the superiority of JSFS in terms of the ro-bustness against noise and the effectiveness on high-dimensional data comparing to the state-of-the-art semi-supervised feature selection algorithms.
Keywords/Search Tags:Semi-supervised learning, Bayesian method, sparse Bayesian, feature se-lection, prior, unlabeled sample, scalability, machine learning
PDF Full Text Request
Related items