Font Size: a A A

Topic Modeling For Text Semantic Analysis Applications

Posted on:2021-04-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:J M WangFull Text:PDF
GTID:1368330614459956Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development and speedy popularization of the Internet technology,people are more and more active on the Internet,leading to an explosive growth of the Internet data,particularly with massive amounts of unstructured and unlabeled text data,including email,social media,news reporting and e-commerce.How to effectively analyze massive unstructured text data and mine effective semantic information quickly and accurately have become great challenges in the field of intelligent text processing.Extensive studies are devoted to use unsupervised learning methods,such as topic models,to analyze unstructured texts.However,texts involve various fields and have different statistical characteristics.Meanwhile,text semantic analysis covers a variety of analysis applications,and the focus and goals of different applications are also different.For example,in the field of public opinion monitoring,models need to pay attention to the evolution of semantics in time,while semantic mining on social media focuses on short text modeling,and user based applications,such as personalized recommendation systems,aim to obtain the details of interest points.It is difficult for traditional topic models to succeed in various types of analysis tasks.In order to solve various issues caused by the massive amounts of text data on the Internet,different topic models have emerged.This thesis studies on text semantic analysis based on topic modeling,including topic evolution,short texts and targeted analysis.Our main contributions are as follows.(1)For topic evolution,we propose a framework for semantic connection based topic evolution with Deep Walk.In order to solve the problems of existing models,such as topic suppression and redundant topics caused by the preset number of topics,poor quality topics and insensitivity to changes caused by ignoring the degree of changes,we propose a framework of topic evolution based on semantic connections which not only indicates the content similarity between documents but also shows the time decay for an adaptive number of topics and rapid responses to the changes of contents.Experiments validate the effectiveness of the proposed model.(2)For short text analysis,we propose a topic model for short texts by incorporating pre-trained word embeddings,named Attentional Segmentation based Topic Model.Compared to existing models that combine auxiliary information and topic modeling in a straightforward way without considering human reading habits,we take attention signals and reading habits into consideration to improve topic modeling.This model integrates both word embeddings as supplementary information and an attention mechanism that segments short text documents into fragments of adjacent words receiving similar attention.Each segment is assigned to a topic and each document can have multiple topics.The experimental results demonstrate that our model outperforms the state-of-the-art in terms of both topic coherence and text classification.(3)For targeted analysis specific to users' interests,we propose a core biterms-based topic model for targeted analysis.Targeted topic modeling is an increasingly vital task due to the prevalence of texts on the Web and the limits of users' interests.Existing approaches for targeted analysis suffer from problems such as topic loss and topic suppression because of their inherent assumptions and strategies,meanwhile they were not designed to address computational efficiency.Hence,we propose a core Bi Termsbased Topic Model,referred to as Bi TTM.By modelling topics from core biterms that are potentially relevant to the target query,on one hand,Bi TTM captures the context information across documents to alleviate the problem of topic loss or suppression;on the other hand,Bi TTM enables efficient modelling of topics related to specific aspects.Our experiments on nine real-world datasets demonstrate that Bi TTM outperforms existing approaches in terms of both effectiveness and efficiency.
Keywords/Search Tags:Text Analysis, Topic Modeling, Topic Evolution, Word Embedding, Short Text, Targeted Analysis
PDF Full Text Request
Related items