Font Size: a A A

Collaboration-training Method Based On Feature Transformation

Posted on:2015-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:W L ZhaoFull Text:PDF
GTID:2298330431495527Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the field of data mining application (Web page categorization), unlabeled datais much easy to get, but it will require much resources for labeling these instances.Under the condition of lacking labeled instances, How to make use of a largenumber of unlabeled instances to improve performance has become a hot researchtopic. Semi-supervised learning is one of the mainstream learning technologies, andcollaboration-training is one of the representative algorithms.The key to success is constructing accurate and diversity classifiers in themethod of collaboration learning, the classic algorithms include Co-Training,Tri-Training, COTRADE and so on. Most of the algorithms adopt bootstrapsampling which only adopts training data samples to train base classifiers, and thenumber of labeled instances is very rare, when the classifier is gained by trainingthese samples, it will hard to have strong generalization ability, which affects theperformance of the classifier.In order to alleviate the above mentioned problems, this paper proposes a newcollaboration-training approach based on feature transformation. We are selectTri-Training as representative, and apply feature transformation in this algorithm.The proposed method employs feature transformation to transform labeled instancesinto new space to obtain new training sets. In this way, Tri-Training avoids theweakness of bootstrap sampling. The other reason is that: the method based onfeature transformation is easier to construct accurate and diversity classifiers.In order to make full use of the data distribution information, this paperintroduces a new transformation method called TMC (Transformation based onMust-link constrains and Cannot-link constrains), and use it to this new Tri-Trainingapproach.Experimental results on UCI data sets show that, in different unlabeled rate,compared with the classic Co-Training and Tri-Training algorithms, the proposedalgorithm based on feature transformation gets the highest accuracy in most data sets. In addition, compared with the Tri-LDA and Tri-CP algorithm, the Tri-Trainingalgorithm based on TMC has better generalization ability.
Keywords/Search Tags:Collaboration Learning, Feature Transformation, Labeled Instances, Accurate, Diversity, Bootstrap Sampling
PDF Full Text Request
Related items