Font Size: a A A

The Research On Short Text Semantic Mining Based On Topic Model And Word Vector

Posted on:2019-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LiFull Text:PDF
GTID:2348330569479963Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The Internet is filled with massive short text data distributed in the various walks of life.These information play a decisive role in public opinion monitoring,personalized recommendation,user characteristics analysis and so on.This data contains rich semantic information,but only manpower to deal with the large amount of short text semantic data requires high cost and a great deal of financial resources.The semantic mining of short text is imminent.Short text semantic mining is acquiring valuable knowledge from short text data according to different contexts,and processing it by computer.In short text mining,short text is short and the context feature is sparse.It is unavoidable to cause short text semantic loss,which greatly affects the effect of text semantic mining.In recent years,topic models and word vector models have been widely applied in short text semantic mining,and have achieved good results in short texts.Based on topic models and word vectors,two different semantic mining algorithms for short text are proposed in this paper.The main work of this dissertation is as follows:1.In this paper,we propose a EBW-BTM algorithm for short text topic semantic mining.For BTM topic models,there is no semantic connectionbetween double words features.In the EBW-BTM algorithm,the word vector model is incorporated into the BTM topic model.First,the word vector model is trained,and the similarity between words is calculated by the word vector model.Then,in the BTM topic model,the similarity degree is judged by the word similarity in the process of parameter reasoning of double words.Finally,the appropriate similarity threshold is found to expand the number of double words.Experiments show that compared with BTM model,EBW-BTM algorithm has greatly improved in the topic cohesion and KL divergence.2.In this paper,we propose a Multi-word2 vec algorithm for vector semantic disambiguation of polysemous words,which can only be expressed by a vector of one word for the word2 vec,and the ability of word2 vec to distinguish the polysemous words is weak.First of all,in view of the sparse feature of the short text,the BTM topic model is used to mark the words.Then the word topic is trained to get the word vector and the topic vector.Finally,the word vector and the topic vector are connected to get the multidimensional word vector of the polysemous word.The experiments shows that compared with the word2 vec,TWE-1 and TWE-2 algorithms,the Multi-word2 vec algorithm has achieved a higher accuracy,recall and F1 value in the text classification,and has a good effect on the sense disambiguation of polysemous words.
Keywords/Search Tags:semantic mining of short text, topic model, multidimensional word vector, polysemous sense disambiguation
PDF Full Text Request
Related items