Font Size: a A A

Research On Short Text Classification Based On Deep Learning And BTM Model

Posted on:2021-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2428330602482612Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the mobile internet,the information data based on short texts has been increasing.The main purpose of short text mining is the screening and utilization of effective information.Automatic short text classification can help users quickly locate text content and selectively process massive text.This paper mainly studies the application of BTM(Bi-term Topic Model)topic model and word vector model of short text,and improves the classification effect of short text through model improvement.The main innovations are as follows:(1)Aiming at the problem of weak semantic relationship between co-occurrence words existing in the process of modeling BTM topic models of short texts at present,we proposed an improved BTM topic model(cw2vec-BTM)combined with the cw2vec word vector model.First,analyze the problems in the BTM model,and then experimentally compare several commonly used word vector models,select the cw2vec model with the best semantic representation ability to train the word vector,and calculate the semantic similarity between words.During the co-occurrence two-word sampling process,it is judged whether the semantic distance of the sampled word pairs meets the prescribed threshold.If so,the number of word pairs is expanded and the expanded sampling theme is updated;otherwise,the operation is performed according to the traditional theme sampling method.Finally,illustrative results indicate that our proposed scheme(the improved Gibbs sampling method of BTM)can effectively improve the topic aggregation and KL divergence of the topic model.(2)Aiming at the problem that the current word vector cannot solve the polysemy in short texts,according to the characteristics of polysemy representing different semantics under different topics,we proposed a Multi-TWE multi-dimensional topic word vector model with the word vector and the BTM topic model.Firstly,according to the BTM model's parameter inference,we obtained the target word and the corresponding topic.Then we divided model algorithm into the MuTWE-1 algorithm and MuTWE-2 algorithm in the light of the different ways of combining words and topics.The M uTWE-1 algorithm directly combines(words,topics)into a "pseudo-word" and inputs it as a word to train the topic word vector in the SE-WRL model.The same word gets different vectors according to different semantics;MuTWE-2 algorithm separates words from topics in word vector training,and the target word vector and the topic vector are weighted to obtain the topic word vector.The same word connected to different topics can represent different semantics.Finally,the algorithm is applied to the word similarity task to prove that the model can realize multi-dimensional word meaning representation of polysemous words.(3)We applied the Multi-TWE model algorithm for short text classification,and proposed a short text classification method based on it.The Multi-TWE algorithm model is trained on the news headline corpus,and the weighted average of the multi-dimensional topic word vectors is used to represent the short text vector,which is then used as the classifier's feature vector to train the classifier.Finally,compared with support vector machine(SVM),BTM and word2vec classification methods,the experimental results show that our proposed method improves the average F1 value by 3.54%,11.41%and 2.86%,respectively.
Keywords/Search Tags:BTM Topic Model, Word Vector Model, Word Sense Disambiguation, Short Text Classification
PDF Full Text Request
Related items