Collaboration-training Method Based On Feature Transformation

Posted on:2015-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:W L Zhao

Full Text:PDF

GTID:2298330431495527

Subject:Computer software and theory

Abstract/Summary:

In the field of data mining application (Web page categorization), unlabeled datais much easy to get, but it will require much resources for labeling these instances.Under the condition of lacking labeled instances, How to make use of a largenumber of unlabeled instances to improve performance has become a hot researchtopic. Semi-supervised learning is one of the mainstream learning technologies, andcollaboration-training is one of the representative algorithms.The key to success is constructing accurate and diversity classifiers in themethod of collaboration learning, the classic algorithms include Co-Training,Tri-Training, COTRADE and so on. Most of the algorithms adopt bootstrapsampling which only adopts training data samples to train base classifiers, and thenumber of labeled instances is very rare, when the classifier is gained by trainingthese samples, it will hard to have strong generalization ability, which affects theperformance of the classifier.In order to alleviate the above mentioned problems, this paper proposes a newcollaboration-training approach based on feature transformation. We are selectTri-Training as representative, and apply feature transformation in this algorithm.The proposed method employs feature transformation to transform labeled instancesinto new space to obtain new training sets. In this way, Tri-Training avoids theweakness of bootstrap sampling. The other reason is that: the method based onfeature transformation is easier to construct accurate and diversity classifiers.In order to make full use of the data distribution information, this paperintroduces a new transformation method called TMC (Transformation based onMust-link constrains and Cannot-link constrains), and use it to this new Tri-Trainingapproach.Experimental results on UCI data sets show that, in different unlabeled rate,compared with the classic Co-Training and Tri-Training algorithms, the proposedalgorithm based on feature transformation gets the highest accuracy in most data sets. In addition, compared with the Tri-LDA and Tri-CP algorithm, the Tri-Trainingalgorithm based on TMC has better generalization ability.

Keywords/Search Tags:

Collaboration Learning, Feature Transformation, Labeled Instances, Accurate, Diversity, Bootstrap Sampling

Related items

1	The Algorithm Of Class Unbalance Ensemble Classifier Based On Sampling And Feature Transformation
2	Study On The Object Detection And Tracking Based On Feature Learning
3	Research On Feature Selection Algorithms For Partially Labeled Hybrid Data
4	Proper Noun Recognition With Transformation-based Learning
5	Research On Extraction Of Reliable Negative Instances In Semi-supervised PU Learning
6	Classification In Imbalanced Data Based On Over-Sampling And Ensemble Learning
7	Research On Image Classification And Video Tracking With Weakly Labeled Data
8	A Trusted-item-based Interactive Method To Improve The Quality Of Labeled Data And Its Application
9	Exploring attributes and instances for customized learning based on support patterns
10	Experimental evaluation of enhanced metropolis sampling