Font Size: a A A

Multi-view Co-training Based On Instances' Classification Difficulty

Posted on:2021-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y P LiFull Text:PDF
GTID:2428330626458932Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology in various aspects,and the continuous improvement of data acquisition capabilities and data storage technology,a large amount of diverse data with rich information has emerged.Due to the increase in the amount of data and the continuous enrichment of data content,it is difficult to describe things comprehensively with a single feature.Through different data acquisition methods,the same thing can be represented in many different ways,which makes the description of data become more convenient and accurate.Feature data obtained by the same object through different ways or different angles is called multi-view data.That is,the feature data is described by multiple attributes or multiple characteristics.However,in real life,it is easier to collect unlabeled multi-view instance data while it requires a lot of resources to obtain a large amount of labeled multi-view instance data.In classification problems,if only a small number of labeled instances are used for learning,the obtained classification model often has poor generalization ability.At the same time,if only a small number of labeled instances for learning are used instead of the more general unlabeled instances,it will lose some hidden information.Co-training is a classic semi-supervised learning method based on multiple perspectives data.It can alternately learn the different information between different perspectives,and use the hidden information of unlabeled instances to improve its generalization performance.This method has become an important part in the field of multi-perspective learning.Continual attention has been paid to the improvement of theories and algorithms in co-training.In this context,this paper mainly studies the multi-view co-training based on the difficulty of case classification.In the first two chapters of the paper,the research background and current situation of multi-view co-training are introduced,and the concepts and related theories of co-training involved in the paper are provided,which provides an important theoretical basis for the research work of this paper.In the third chapter of the paper,a multi-view standard co-training algorithm based on the difficulty of case classification is proposed.Based on in-depth analysis of co-training,in the initial stage of the algorithm,there is a problem that the base classifier easily mislabels the initial unlabeled instance samples,which leads the subsequent iterative training to exacerbate the impact of such instance samples and reduces learning performance.In view of the above problems,this paper proposes an improved algorithm.The improved algorithm uses the spatial relationship between labeled and unlabeled instances in multi-view data to design a calculation method for the difficulty of instance classification.Specifically,the classification difficulty is converted into a difficulty value,and it is used as one of the conditions for adding a new instance to the set of labeled instances,which achieves the goal of improving classification performance.Experiments show that the proposed algorithm can effectively improve the learning ability and is more reliable than the comparison algorithm.In the fourth chapter of the paper,a corresponding improved algorithm is proposed by combining the classification difficulty of the examples with the Tri-training of the co-training mode.This algorithm mainly aimed at the initial stage of poor performance of the Tri-training primary classifier,two auxiliary classifiers may make incorrect judgments on the categories of unlabeled instances at the same time,resulting in classification errors.This algorithm makes use of the difficulty of classifying an instance to judge whether the instance is easily predicted correctly under the same prediction markers of the auxiliary classifier so that instances of incorrect prediction results can be reduced.In the subsequent co-training process,eligible "more trusted" instances are continuously added to the set of labeled instances of each main classifier to achieve the goal of improving overall classification performance.Compared with other multi-perspective classification algorithms,it shows that the proposed algorithm is better.
Keywords/Search Tags:Multi-view Learning, Co-training, Case Classification Difficulty, Tri-training
PDF Full Text Request
Related items