Font Size: a A A

Semi-Supervised Chinese Text Classification Based On Selective Integration

Posted on:2019-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhaoFull Text:PDF
GTID:2347330542481680Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the boost of the Internet,the text data is growing exponentially every year,and text classification is becoming more and more important as one of the important technologies to deal with text data.Traditional text categorization methods are generally based on supervised learning,and it requires a large number of labeled text to train a good classifier.But in fact,the number of labeled text is relatively small while the number of unlabeled text is relatively large.Using a small number of labeled text mostly leads to the poor performance of trained classifier and missed information in unlabeled text,which wastes lots of resources.Therefore,how to use this part of the data rationally and effectively have become a top priority.First,this paper describes the existing algorithms for text classification,and points out their advantages and disadvantages.Aiming at solving the problem that the labeled text is insufficient and the performance of the classifier is difficult to be improved effectively,this paper based on NB classifier,combined Bagging algorithm,EM algorithm and selective integration method,an EM selective integrated learning method based on Bagging algorithm is proposed,while the feasibility and validity of the method have been proved by the designation of simulation experiment.Then the method is used to deal with the actual problem of Chinese text classification.The experimental results show that:1)The EM algorithm applied to text classification,can overcome the shortcomings of lack of marked training texts and improve the classification performance of NB classifier,but compared with some supervised learning algorithms there is still a certain defect,and the performance of the classifier is slightly lower;2)it can obtain a strong classifier by comparing a number of EMbase classifiers,selecting a better base classifier and then integrating;3)this method solves the problem that the classifier is weakened by some poor base classifier in the Bagging algorithm,and improves the overall performance of the learner.
Keywords/Search Tags:Text classification, Semi-supervised learning, Selective integrated learning, EM, Bagging
PDF Full Text Request
Related items