Font Size: a A A

Research On Topic Modeling And Applications For Text Sentiment Analysis

Posted on:2019-03-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:1368330551958765Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of communication and computer technology,Internet applications continue to deepen in all areas of society.Text data,as the carrier for directly expressing people's emotional opinions,occupies a larger proportion in the network data.How to analyze and excavate these text data rich in sentiments has been a hot topic in both academic and industrial circles in recent years.Text representation is crucial in the text sentiment analysis process,which can directly affect the performance of text analysis methods.As a mainstream text modeling and representation method,text topic modeling is widely used in sentiment analysis.It can use the relationship of words represented in text contents to construct topic concept representation space by extracting text contents of high relevance and similarity.However,during topic modeling process,the text sentiment content is treated equally with other content,which causes the lack of highlighting sentiment aspects.In addition,classical topic modeling does not consider the semantic relationship patterns such as text sequences,word contexts,etc.,and the text representation capability also has certain limitations.In view of the advantages and disadvantages of the topic modeling,this article starts with the actual needs of the text sentiment analysis task,makes full use of deep learning methods and domain knowledge,with the destination of expanding the sentiment semantic information covered by the topic representation,enhancing the expressional forms and ability of topics for sentiment contents,and extending the usages of topic representation in sentiment analysis tasks,and finally has formed a text sentiment analysis pattern where topic modeling method is adapted to the task target.The major content and achievement of this article are as follows:(1)Sentiment analysis based on specific knowledge topic modeling.In opinion spam classification tasks,it is a difficult problem to distinguish opinion spam expression from normal sentiment expression,which would directly interfere with the performance of sentiment analysis.In this regard,this paper proposes a specific knowledge enhanced topic modeling for opinion detection task.This methods combines the existing sentiment dictionary resources and designs five heuristic rules for identifying spam opinions.This article also introduces these heuristic rules into the topic modeling process to enhance the ability of the topic to recognize spam opinions.Through experiments of opinion spam detection,it has been proved that the text representation constructed by the combination of heuristic rules and topic modeling can distinguish several types of spam opinion from other normal content,and improve the classification performance of effective comments and spam comments.Therefore,it also provides excellent data resources for text sentiment analysis.(2)Multi-strategy text representation integrated sentiment analysis.Data resources are the basis of text sentiment classification.When the target language annotation data is insufficient,the annotation data of other languages can provide strong support,which involves the task of multilingual sentiment analysis.However,the representation features of different languages are quite different,and how to align feature semantics becomes a key issue in this research.Therefore,based on cross-language topic representation and traditional vector space model representation,this dissertation has designed a multi-strategy cross-language sentiment analysis framework to combine the advantages of the two representations and applies it to build a cross-language sentiment classification framework.Through experiments,we have verified that the distribution of sentimental propensity features has decentralized and aggregative effects.It has also been proved that the cross-language topic representation can effectively balance the differences in sentiment characteristics and mitigate data sparsity.The experiment results have achieved expected results.(3)Semi-supervised sentiment analysis based on topical measurements.When the annotation data of text sentiment analysis is insufficient,a large amount of unlabeled data can be used.This involves a semisupervised learning framework whose core technology is the sample content's topical metric.This article constructs sample content metrics for two different functions and integrates these metrics into a semi-supervised learning framework to propose a topical metrics based semi-supervised sentiment classification method.We apply this method to cross-language sentiment classification tasks and design an Aligned-Translation Topic Model to construct the text topic representation space.The results have achieved significant improvement in cross-language tasks,indicating that topical metrics based semi-supervised sentiment classification method is effective.(4)Embedding integrated topic modeling for sentiment analysis.Sentiment semantics is a comprehensive embodiment of people's feelings and thinking activities.It has multiple forms and angles of expression in the text,such as contextual contexts of words,text topics,etc.,and the focus of sentiment content described by different ways of expression also differs.Therefore,relying solely on textual topic relationships is not sufficient to fully reflect the entire content of sentiment semantics.It is also necessary to incorporate the sentiment semantics portrayed by other expressions into the topic representation?This paper proposes an information fusion method of topic representation and vector representation,and uses this method to design an Embedding Enhanced Topic Model,which has introduces the word semantic information reflected by word vectors into topics.The experimental results show that,after absorbing semantics of word vectors,the topic representation can effectively cluster words with different grammatical and semantic functions in the text,and classify them into corresponding topics,hence improving the ability of the topic representation to describe sentiment details.Furthermore,by using the Embedding Enhanced Topic Model,this article has also designed a sample topical similarity measurement.Experimental results show that this measurement presents better performance in complex text clustering tasks.(5)Function designs and applications of topic modeling in sentiment analysis systems.Basing on the distributed system design framework for web services,we have designed the topic modeling and sentiment analysis methods proposed in this paper as components,and added each component to the system framework according to business logic to achieve an online text sentiment analysis prototype system.The details of the framework design and functional organization of the prototype system are described in detail,and the results of the data analysis are demonstrated with the actual product reviews as examples.
Keywords/Search Tags:Text sentiment analysis, Text representation, Topic modeling, Representation learning
PDF Full Text Request
Related items