Font Size: a A A

Classification Algorithm Of Vietnamese News Based On Fasttext

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:J D DengFull Text:PDF
GTID:2428330626463700Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a socialist country bordering on China,Vietnam has more and more communication in politics,economy and culture with China.With the rapid development of the Internet and the popularity of smart phones,the speed of news communication has exceeded the regional limit.In order to better study and learn Vietnamese news,collect Vietnamese news texts,organize and divide them,which can effectively help people to better understand the situation in Vietnam,has a very important research significance.At present,there are three methods of text classification,which are based on semantic rules,traditional machine learning and deep learning.The traditional text classification algorithm based on machine learning usually uses TF-IDF algorithm to extract text features,but it can't be all-around,some features can't be extracted,and it can't capture the relationship between words.The final result is not accurate and the classification effect is not ideal.In recent years,in the field of natural language text classification,the commonly used method is based on deep learning.Although this method can achieve better classification effect,the number of layers in the calculation will continue to increase,and the amount of calculation will gradually increase,so the calculation time is relatively long,and needs a lot of calculation resources as support.Fasttext method can effectively solve the problems mentioned above.Compared with other methods,this method can not only fully ensure the accuracy of feature extraction,but also effectively save computing time.The only small defect is that the input layer fails to extract the features of the input data,which will have a certain impact on the classification effect.Based on the above problems,this paper mainly carried out two improvements on the fastText model: "weighted screening of important words" and "fusion news headline",respectively proposed the algorithm CF-fastText and the algorithm Title-fastText,that is,the main research and improvement of news content feature weight The CF-fastText model and the Title-fastText weighting algorithm for news title weighting are applied to the system to solve the problem of Vietnamese news text classification.95% accuracy has been achieved in ten single-label classification projects.
Keywords/Search Tags:Vietnamese, fastText, News Classification
PDF Full Text Request
Related items