Font Size: a A A

Research On FastText-based Classification Of News Texts And Its Application In Agricultural News

Posted on:2020-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:T T HuoFull Text:PDF
GTID:2428330575979897Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,a large number of data are stored in the form of text,and text classification,as the most common text mining technology,is of great significance for the discovery of knowledge in a large amount of messy text data.At present,there are three methods for text classification: semantic rule based method,traditional machine learning based method and deep learning based method.Among them,fastText classification model is a recently proposed shallow neural network model capable of fast and efficient text classification.It can achieve the same classification effect as deep learning and has lower training cost than deep learning,so it is widely used in the industry.FastText uses n-gram feature enhancement to obtain local word order information,but after n-gram feature enhancement,some meaningless low-frequency words will be generated,which will interfere with text classification.At the same time,for the specific problems of news text,news title is often a high degree of generalization of a news article.In the fastText model,the word vector of the whole article is summed and the average value is taken as the vector representation of an article,failing to consider that the news title should have a higher weight in the representation of an article.Therefore,in view of the above problems,the fastText model is mainly improved by "weighting important words" and "integrating news titles".Two algorithms,CF-fastText and Title-fastText,are proposed respectively.Meanwhile,the two improved methods are combined to propose the algorithm,CFT-fastText,and applied to the system to solve the problem of agricultural news text classification.The main work is as follows:1.FastText algorithm improvement:First,the CF-fastText algorithm is proposed.In the input layer,according to the idea of TFIDF-CF algorithm,the n-gram feature enhanced sequences are weighted screened to remove some low-frequency unfamiliar and meaningless words.Experiments show that the CF-fastText algorithm improves the text classification effect.Second,the Title-fastText algorithm is proposed.Merge the news title vector when calculating the vector representation of an article in the hidden layer.Considering that the Title vector has a higher weight when representing an article,the Title-fastText algorithm adds the Title vector and the mean of the news word vector to represent a news article.Experiments show that the Title-fastText algorithm can better represent news texts and achieve better classification effect than the text vector mean only.Third,the two improved methods are combined to present the CFT-fastText algorithm.At the same time,"weighted the important words" and "integrated news titles",the experimental results show that the CFT-fastText algorithm can achieve better classification effect than the single improvement.2.The realization of agricultural news automatic classification system and the application of CFT-fastText in the systemThe automatic classification system of agricultural news text is realized and the CFTfastText algorithm is applied in the system.The system can regularly crawl the relevant agricultural news on the network and store it in the database.The classification algorithm CFTfastText in the background can classify the unclassified text.After classification,the system will store the category tags in the database.
Keywords/Search Tags:Machine Learning, Text Classification, fastText
PDF Full Text Request
Related items