Font Size: a A A

Research Of Topic Model-based Approaches For Sentiment And Topic Modeling On Texts

Posted on:2018-09-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:K XuFull Text:PDF
GTID:1368330545464249Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of Internet and the emergence of social media,a large volume of user-generated texts accumulate day by day and month by month.Thereinto,these texts con-tain long texts,such as news texts,texts with different lengths,such as review texts and short texts on social media,such as sina weibos and tweets.News texts often describe content such as significant events and scientific discovery,which contain rich information,but talk about topics of fixed type.Review texts contain much opinion of users towards reviewed objects-Different from long texts like news texts and review texts,although short texts contain limited information,they are updated very frequently and have a huge volume about general domain topics.The large volume of texts contain much valuable information,but mining the latent seman-tic structure from these texts is a difficult problem in natural language processing and informa-tion retrieval.Topic model is a popular and effective method,which analyze the latent semantic structure on texts by mining the high-order word co-occurrences.Recently,topic models have been applied to many research topics and get a good achievement.This dissertation focuses on the research of topic modeling or sentiment/topic modeling on three kinds of texts,i.e.,news texts,review texts and tweet texts.News texts,normally talk about the objective topics,while review texts and tweet texts contain rich subjective topics.Hence,in this dissertation,we only model topics on news texts(without considering the aspect of sentiment)and model sentiments and topics on review texts and tweet texts.In this disser-tation,our work is based on generative topic model,which are utilized for mining topics or sentiment-aware topics.Thereinto,the existing topic models for news texts and sentiment/topic models for review texts have not reasonably considered to introduce word-level or entity-level knowledge.Moreover,there exist no effective models for modeling sentiment and topic on short texts.To solve these problems,this dissertation mainly research three problems on modeling topics or sentiment-aware topics:incorporating knowledge in knowledge base to model topics of news texts,incorporating word-level knowledge to model sentiment and topic of review texts and introducing user and time to model sentiment and topic of short texts.The detailed research topics are as follows:(1)We analyze the shortness of existing work,i.e.,these models only depend on the high-order word co-occurrences.Hence,topic models cannot mine semantic information well when words lack rich co-occurrence patterns.To overcome the problem,we propose a new topic model based on wikipedia knowledge,which can leverage the external knowledge,i.e.,concepts and categories in Wikipedia knowledge base,to improve the performance of topic modeling.Our proposed model,WCM-LDA,models not only words and entities from texts,but also con-cepts and categories from the external knowledge base,to alleviate the problem of words with sparse co-occurrence patterns.Moreover,WCM-LDA visualizes topics with words,concepts and categories.(2)On review texts,the semantic information between words is still important for modeling sentiment-aware topics.However,the existing sentiment and topic models on review texts only depend on the high-order word co-occurrences and does not work well for words without rich co-occurrence patterns.Different from news texts,they contain a lot of entity mentions which can incorporate knowledge from knowledge base.The review texts contain many aspect words and opinion words.The semantic knowledge between aspect words and opinion words are effective for modeling sentiment and topic on review texts.Hence,we propose a model to incorporate lexical knowledge from word embedding,to introduce semantic association between words,into sentiment topic models,which can solve the problem of words with low frequencies.In our proposed HST-SCW model,close words on the space of word embedding can be assigned to the same semantic clusters,so that semantically similar words can be assigned to the same sentiments and topics.(3)To improve the quality of modeling sentiments and topics on short texts,we analyze the shortness of existing sentiment and topic models on short texts.i.e.,these models solely depend on the high-order word co-occurrences but short texts lacks rich word co-occurrences.Different from news texts and review texts,short texts,such as weibos,are too noisy to well introduce knowledge like knowledge base and word embedding.To solve the problem,we find that content of tweets are strongly related to time and users in social media.The short texts related to users are mostly about users' personal interests,while short texts related to time are often about current events/topics.By introducing the structural knowledge,such as user and time,we propose a new model of topic and sentiment modeling on short texts,TUS-LDA,which can utilize time and users to solve the problem of context sparsity.In TUS-LDA,we model each short in the timeslice level or user level,where each short only talk about a topic but can describe multiple sentiments.
Keywords/Search Tags:News Texts, Review Texts, Short Texts, Knowledge Base, Word Embedding, Topic Modeling, Sentiment Analysis, Topic Model
PDF Full Text Request
Related items