With the rapid development of the Internet,information technology and storage technology,the amount of data has grown exponentially.It is very difficult to obtain a large number of labeled data samples,and it's easier to obtain a large number of unlabeled data samples.Semi-supervised learning and transfer learning can use a small amount of labeled data and a large amount of unlabeled data for training and testing,and do not need a lot of manpower and material resources to obtain labeled data,saving time and cost.However,the use of unlabeled data in semi-supervised learning may lead to a decrease in learning performance,while the domain adaptation method in transfer learning ignores the dependence of labels in the dimension reduction process,which may also lead to a reduction of classification accuracy.This paper has improved these problems.The main work is as follows:(1)In order to reduce the influence of the label noise produced by the Tri-training algorithm during the learning process on learning performance,this paper proposes a Tritraining algorithm based on cross entropy,a safe Tri-training algorithm and a safe Tri-training framework based on cross entropy.In the proposed method,cross entropy is used instead of the error rate to better reflect the gap between the model's predicted results and the real distribution,and the convex optimization method is used to reduce the labeling noise,improve the quality of the pseudo label and the generalization performance of the model.The validity of the proposed method is verified on the UCI benchmark data sets,and the performance of the method is further verified from a statistical point of view by using a significance test.Experimental results show that the proposed semi-supervised learning methods are superior to the traditional Tri-training algorithm in terms of classification performance.Among them,the safe Tri-training algorithm based on cross entropy has higher classification accuracy and generalization ability.(2)Existing domain adaptation methods don't consider the dependence between labels and features during the dimension reduction process,and don't consider how to preserve data locality after dimension reduction.In response to these problems,this paper proposes a semisupervised balanced distribution adaptation method.The proposed method first uses the maximum mean discrepancy(MMD)to approximate the marginal and conditional distribution distances of source domain and target domain,and then uses the Hilbert-Schmidt independence criterion(HSIC)to measure the dependence of labels and features in source domain,and then uses the local retentivity of the manifold regularizer to retain the locality of the data.Finally,the data dimensions of the source and target domains from different distributions are reduced.The obtained datasets are used in Tri-training,which solves the problem that the datasets used in Tri-training algorithm must meet the conditions of non-independence and identical distribution.Experimental results show that the proposed method has good classification accuracy and can maintain stability within a certain range of parameter selection. |