| In the era of big data,data is often generated in high-speed and constantly changing forms.In practical application scenarios,data usually exists in an unlabeled form.The large and diverse nature of data makes it impossible for human experts to correctly label each sample within a limited time.At the same time,labeling all samples is time-consuming,expensive,and sometimes unnecessary.Therefore,semi-supervised learning techniques are more suitable for processing this data.However,due to the constantly changing characteristics of the data,the assumption that the samples obey the same distribution is difficult to satisfy in practical application scenarios.There are correlations between data with different distributions,and we can make full use of these similar source domain data to assist the target domain to build models quickly and accurately.Therefore,we focus on a new semi-supervised inductive transfer learning paradigm,that is,the data in both the source and target domains contain labeled and unlabeled samples,and the data distributions between the two domains are different but similar.This learning paradigm is more in line with real application scenarios.For example,in the application of computer-aided diagnosis systems,medical experts can only carefully diagnose a small number of medical images.In addition,due to aging or upgrading of equipment,the pattern of the medical images collected before may be different with the pattern of the medical images collected at the current time.Therefore,semi-supervised classification and transfer learning scenarios face the following two main challenges: 1)How to mine the implicit knowledge in a small number of labeled samples and a large number of unlabeled samples to train a classification model with good generalization ability;2)How to learn efficiently from labeled and unlabeled samples in the source and target domains to achieve more accurate classification of samples in the target domain.In summary,considering the research value and challenges of semi-supervised classification and transfer learning,the research content of this thesis is summarized as the following two aspects:First,the ensemble model in the existing disagreement-based methods for semisupervised classification cannot well trade-off the relationship between the classification accuracy and diversity of the component classifiers.This thesis proposes a semi-supervised classification algorithm based on evolutionary learning,named Tri-Evolving.Initially,three component classifiers are randomly selected from three populations of trees,which are generated by a generation algorithm on the training set.Then the selected component classifiers are re-evolved in the corresponding population in turn,and its evolutionary orientation is induced by other two component classifiers.To be specific,Tri-Evolving take the advantage of multi-population co-evolutionary algorithm to optimize the average classification accuracy and diversity of component classifiers.It could effectively balance the relationship between the two in the learning process by maintaining the average classification accuracy increasing while not losing as much diversity as possible.A large number of experimental results verify the advantages of Tri-Evolving algorithm.The innovation of Tri-Evolving algorithm mainly lies in: using multi-population coevolutionary algorithm to trade-off the relationship between the average classification accuracy and diversity of component classifiers in the ensemble model,that is,by ensuring that the average classification accuracy is increased without losing as much as possible diversity among classifiers,thereby inducing an ensemble model with good generalization ability.Second,existing semi-supervised transfer learning methods assume that the source domain is well-labeled data or a trained model.We relax this strict assumption that both the source and target domains are semi-supervised settings with different data distribution.Based on this,a new semi-supervised inductive transfer learning framework named CoTransfer is proposed.It first generates three TrAdaBoost classifiers for transfer learning from the source domain to the target domain,and meanwhile another three TrAdaBoost classifiers are generated for transfer learning from the target domain to the source domain,using bootstrapped samples from the original labeled data.In each round of Co-Transfer,each group of TrAdaBoost classifiers are refined using the carefully labeled data,one of which is the original labeled samples,one is the samples labeled by this group of classifiers,and the other is labeled by another group of TrAdaBoost classifiers.Finally,the group of TrAdaBoost classifiers learned to transfer from the source domain to the target domain produce the final hypothesis via majority voting.Experiments on four UCI datasets and text classification tasks verify Co-Transfer can effectively reuse source domain data and explore labeled and unlabeled data of the source and target domains to improve generalization performance.The innovations of Co-Transfer algorithm mainly lie in: 1)For the first time,it proposes a semi-supervised inductive transfer learning paradigm in which only part of the samples in both the source and target domains are labeled;2)It proposes a new semi-supervised inductive transfer learning framework.The framework performs bidirectionally synchronized semi-supervised learning and transfer learning between the source and target domains,and it is well suited for transfer learning where only part of the samples in both the source and target domains are labeled and do not require a specific type of classifier;3)In two cases,that is,when augmenting the source and target domain labeled data sets,we propose to use certain strategies to limit the negative impact of noisy pseudo-labeled samples. |