Font Size: a A A

Automatic Topic Labelling Based On Word Vectors

Posted on:2017-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:W Q KouFull Text:PDF
GTID:2428330590491519Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Topic labelling research has received more and more attentions in recent years.Topics in topic models can represent hidden semantic information of articles,while topic labelling aims at making topics more readable by human beings.Through unsupervised training,traditional topic model can generate topic-word distribution.Top 10 words in the distribution are usually used to represent topics.For example,‘now,weather,service,heavy,airport,closed,flights,storm,power' represents a topic that talks about heavy snow weather.Such representation is simple,but is also relative hard for human to understand.Researchers propose several topic labelling method,including feature based method,summarization based method,human annotation etc.In this paper,we propose a word vector based topic labelling method.On one hand,introduce word vector models to topic labelling,on the other hand,compare the effectiveness of mainstream word vector models.We explore in the following: 1.Introduce word vector models to topic labelling;2.Propose an automatic gold standard generation and topic label result evaluation method;3.Study the effect of several word vector models.Through experiment we find that Letter trigram and GloVe vector are not sensitive to corpus type,CBOW performs well in large corpus while Skip-gram performs well in corpus that has higher semantic concentration.
Keywords/Search Tags:topic model, word vector, topic label
PDF Full Text Request
Related items