Font Size: a A A

Research On Topic Models And Their Applications In Short Text Streams

Posted on:2018-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y K ZhaoFull Text:PDF
GTID:2348330512990263Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Capturing the topics of the documents plays an important role in text mining and semantic understanding.With the rising of computer science and artificial intelligence,understanding the main ideas and semantics of natural language are more important than before.Topic modeling is a convenient tool for capturing the main idea of each document and plays an important role in information retrieval,artificial intelligence,data mining and natural language understanding.Topic models are used for modeling latent topics and semantics of the documents,while state-of-the-art work mainly focus on relatively long documents.But when they deal with the contents in the setting of short text streams,these models cannot obtain ideal performance.In this thesis,we focus on studying the topic models in the streams of short text.With the rising of social media,hundreds of millions of actives users are sharing short texts on microblogging platforms such as facebook,twitter,Sina weibo.A good understanding of the contents of these short texts is important for the design of applications that cater for users of such platforms,such as personalized tweet recommendation,personalized microblog search,public opinions understanding,computational advertising.Our work in this thesis are concludes as follows.Using external media contents for topic modeling.Modeling users' preference is more challenging than modeling that in the context of long documents as it is difficult to model users' interests in this setting of short texts like tweets.To address this challenge,we propose to incorporate the tweet contents published by the WeMedia accounts to enrich the words in short texts,while WeMedia is a type of accounts in microblogs that only has media attributes publishing original and valuable messages.We use this topic model to infer each user's interests and then integrate the social relations between the users and the tweet publishers to provide a personalized tweet re-ranking framework.We use the model for personalized tweet re-ranking which re-rank the tweets that are posted to each user based on the users' preference for these tweets.The results demonstrate that we can model each user's interests more precisely and recommend more useful tweets for them.Dynamic user clustering topic model.We consider the dynamic nature of users'interests and the sparsity of short texts for topic modeling.This model adaptively tracks the changes of each user's time-varying topic distribution based both on the short texts the user posts during a given time period and on the previously estimated distribution.To alleviate data sparsity,we propose a Gibbs sampling algorithm where a set of word-pairs are constructed for sampling.This model is used for uncovering the dynamic users'interests and then used for dynamically user clustering to make the clustering results explainable.The results demonstrate the effectiveness of our proposed dynamic Dirichlet multinomial mixture user clustering topic model.The clustering results are explainable and human-understandable,in contrast to many other clustering algorithms.
Keywords/Search Tags:Topic modeling, Topic models in short text streams, Personalized tweet recommendation, User clustering
PDF Full Text Request
Related items