Font Size: a A A

Research On The Method And Its Application Of Short Text Classification Based On FastText

Posted on:2022-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2518306536991799Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Short text classification is one of the hot topics in the field of natural language processing.Due to the sparse feature of short text,conventional short text classification methods cannot meet the requirements of accurate and efficient analysis of text intention in the fields of public opinion analysis,intelligent question and answer,accurate search,etc.Based on the text classification dataset of Toutiao News,this paper aims to improve the effect of short text classification,and studies the feature expansion method of short text category,the filtering method of unintentional words and words with low classification contribution,and the short text classification method based on fastText.Firstly,from the point of view of improving the quality of short text category extension feature,an extension method of category feature based on TFIDF-LDA is proposed.In this method,TF-IDF method was used to score the original input word sequence of LDA topic model one by one,and a threshold value was set to eliminate some words that did not contribute much to the distinguishing topic,so as to improve the quality of the candidate subject word list input into LDA topic model,and to achieve the purpose of improving the quality of the extraction of main topics in short texts(i.e.,category expansion features).Secondly,in order to reduce the interference of unintentional words and words with low classification contribution in short texts to the classification results,a lexical information entropy calculation formula is designed based on the definition of information entropy,and used to filter the unintentional words and words with low classification contribution in the N-gram subwords list,forming a lexical filtering method of N-gram words based on information entropy.Third,in order to ascend fastText model classification effect,use category feature expansion method based on TFIDF-LDA expanded its essay in this category feature selection of high quality,using the N-gram lexical filtering method based on information entropy to filter the unintentional words and the words with low classification contribution in the N-gram lexical subwords list,based on this,the formation of FE-fastText essay this classification model.Finally,in Jinri Toutiao News text Classification data set in FE-fastText essay this classification model and other models contrast experiment,verify the effectiveness of the category feature expansion method based on TFIDF-LDA and the N-gram word filtering method based on information entropy to improve the model performance,and FE-fastText essay this classification model in the application of intelligent question answering system research.
Keywords/Search Tags:short text, fastText, classification, feature expansion, information entropy
PDF Full Text Request
Related items