Research On Short Text Topic Model Based On Semantic Information And Word Triangle

Posted on:2020-09-23

Degree:Master

Type:Thesis

Country:China

Candidate:W Jing

Full Text:PDF

GTID:2428330575458241

Subject:Computer technology

Abstract/Summary:

With the accelerating pace of social development and the "short and fast"user ex-perience brought by smart mobile terminals,people's communication on the network is becoming more and more fragmented.Therefore,short text data plays a increasing im-portant role in network information exchange nowadays.For example,social network status,micro-blog text messages,traditional news headlines,short video headlines and question-and-answer websites are forms of short text.On the other hand,With the rise of super companies,like microblogging,Zhihu,Facebook,Twitter and so on,short text data is generated and accumulated at a great speed.Therefore,mining topic informa-tion from massive short text data is of great value.Such as public opinion analysis,information retrieval,personalized recommendation,user interest clustering and so on are all the applications of topic mining.However,using traditional text mining meth-ods to mine thematic information of short texts is very difficult,mainly because the co-occurrence information of words in short texts is very sparse.In order to get more feature information from short texts,scholars have proposed various improved models,but most of them ignore the semantic relationship between words.In order to solve this problem,this paper first proposes a short text topic model based on the priori knowl-edge of semantic information and word frequency information.On this basis,the topic unit structure is studied,and a semantic word triangle topic model is proposed.The main work of this paper is as follows:1)In view of the problem that traditional topic models treat word pairs of different importance equally,this paper assumes that the more closely semantically related words are of,the more likely they are to belong to the same topic.On this basis,the paper measure the semantic similarity of words by introducing the words embedded training on the external corpus,and put the prior knowledge of the distribution of information subject words in order to make the model pay more attention to those words with larger semantic similarities.2)In view of the problem that traditional words have an impact on the quality of high-frequency words in the topic model,this paper assumes that words appearing in most documents have a weak ability to represent the topic.Based on this hypothesis,this paper introduces IDF and semantic similarity as prior knowledge of word distri-bution,alleviating the impact of high frequency words on topic quality.Based on the BTM model,an improved WEI-BTM is proposed,which improves the performance of the topic model with traditional words.3)In view of the neglect of common word co-occurrence networks for some pairs of words which have close semantic connections but few co-occurrences,this paper proposes a new method to construct semantic word networks,which enables the word networks to pay more attention to the subject links between words in an all-round way.Furthermore,on the basis of this network,a more closely related basic unit-the wood semantic word triangle structure is proposed.On this basis,a SWTTM short text topic model is proposed.4)This paper also makes two comparative experiments on two real-world Chinese short text datasets with three traditional baseline models.The experimental results show the superiority of the SWTTM model in short text topic mining.

Keywords/Search Tags:

Short Text, Topic Model, Word Network, Word Triangle, Word Embedding

Related items

1	Topic Model For Short Texts Based On Word Triangles
2	A Study Of Short Text Topic Models Based On Information Of Word Embeddings
3	Research On Topic Model Over Short Texts With Incorporation Of Word Embedding
4	Research On Text Topic Modeling Based On Word Embedding
5	Research On Short Text Topic Model Based On Word Network And Word Vectors
6	Combining Topic Model And Word Embedding For Short-Text Classification
7	Research On Short Text Modeling Based On Word Embedding
8	The Design And Implementation Of Text Topic Key Word Processing System Based Chinese Word Segmentation
9	Research On Short Text Aspect Extraction Base On Topic Model And Word Embedding Mechanism
10	Research On Jointly Learning Word Embeddings And Latent Topics In Text