Font Size: a A A

An Improved Fast Text Classification Model For News

Posted on:2020-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:D WuFull Text:PDF
GTID:2427330596493441Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet Technology,traditional media like newspaper is forced to transform into digital media gradually.For the convenience of readers,traditional media always need to put all kinds of news in sort.Obviously,it takes a long time and much money that news is to sort by human.To build an automatical news classifier for media,the paper modified the classification model of FastText..The result shows the modified model reduces the cost of sorting,and increase the efficiency,which satisfies the read-time property of news.The paper firstly collected the data of news in Sina from 2005 to 2011,made it a training set by tokenization,getting rid of the stop words and one-hot encoding.Then made bootstrap to the training set to get 150 sampled training set.At each sampled training set,the paper trained a weak classifier of FastText,150 in total.Finally combine these trained weak classifiers to a strong classifier using bagging method.For a new text sample,the strong classifier uses the result of class which 150 weak classifiers vote most.And the experience shows that when the number of weak classifier is less than 30,the precision of strong classifier is waving obviously;when the number of weak classifier is more than 30 and less than 150,the precision of strong classifier is growing gradually;when the number of weak classifier is more than 150,the precision of strong classifier is converging and achieves high precision.And the strong classifier has a better performance than SVM,GBDT and KNN in the experiment.
Keywords/Search Tags:News, Text, Classification, Weak Classifier, Strong Classifier
PDF Full Text Request
Related items