Font Size: a A A

Research And Implementation On The Short Text Classification Method Based On Topic Model

Posted on:2021-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:H X WangFull Text:PDF
GTID:2428330614958448Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet and the acceleration of the pace of life,people are more accustomed to obtain and share information from Internet platforms,which also leads to a large amount of short text messages generated on the network.Some content distribution,information retrieval,and social networking platforms are all sources of short text messages.If we can extract related topics from the massive information and classify them,it can alleviate the impact of information overload on us to a certain extent.Therefore,how to effectively extract the subject of information and quickly classify it is increasingly concerned by researchers.The main difficulty faced by short text classification is that the length of the text itself is too short,which leads to the problems that the extracted features are too sparse and the amount of context information is too small.The feature extension of short texts through external corpus and knowledge base not only takes too long to extend,but also easily introduces noise data.In addition,the traditional vector space model and machine learning classification algorithm directly applied to short text classification are not very effective.To address the issue that short text features are too sparse and noise data is easy introduced when external corpora for feature expansion is used,a feature expansion method based on the topic model is adopted.The WTTM model was used to obtain topicword distributions and completed the theme feature expansion on short text content.Aiming at the difference between the extended feature and the original feature,when computing the feature weight of the extended feature,the semantic similarity between the extended feature and the original feature is incorporated to make the semantic connection between the features closer,so that the classification result is more accurate.In summary,a short text classification method combining word vector and topic model is proposed.In order to verify the effectiveness of this method,the short text classification algorithm is compared with other short text classification methods.Experimental results show that when this method is used to classify short texts,the final classification effect can be improved.
Keywords/Search Tags:short text, topic model, word vector, text classification
PDF Full Text Request
Related items