Font Size: a A A

Research On Semantic Constrained LDA Model For Extracting Product Aspects And Opinions

Posted on:2017-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y PengFull Text:PDF
GTID:1318330512460100Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet and the convenience of online shopping,shopping on Web becomes a trend and shows an unprecedented explosive growth, and a large number of product reviews are posted by consumers on the shopping website. By using the technology of sentiment analysis in natural language processing, we can get useful evaluation knowledge from the massive text data. Sentiment analysis can obtain the opinion polarity classification of the evaluation objects, which includes three granularity levels: (1) Sentiment analysis at document level; (2) Sentiment analysis at sentence level; (3) Sentiment analysis at aspect level. The sentiment analysis of the document level and sentence level can get the coarse granularity of the evaluation object, and it is difficult to meet the requirements of the further understanding of the product components and the evaluation of the attributes. To obtain sentiment classification knowledge on local components and attributes of the product, it needs to analysis the product reviews at aspect level, it is called fine-grained sentiment analysis,and its core task is to extract aspects and opinion words and find the correlation between them at the same time. Compared to the coarse grained sentiment analysis, the fine-grained sentiment analysis is more challenging.Product reviews are described by unstructured text expressed by the natural language, the semantic and syntactic structure is random, and the data volume growth is very rapid, which brings great difficulties to the extraction of aspect words and opinion words. By using natural language understanding and data mining technology, it is possible to realize fine-grained aspects and opinions mining on the basis of reducing the dimension of text data effectively. Because the LDA model can be used to reduce the dimensionality of text data, and realize the topic words extraction and clustering automatically from large-scale text set. LDA model has attracted great attention and has been widely used in the research of extracting aspects and opinions.Aspect level sentiment analysis needs find more local structure relations between the aspect words and opinion words, the frequency of these words is lower than global words, and the relations between them is implied in the sentence, phrase structure,especially in the Chinese product reviews, so the difficulty of extracting local aspects and opinions is significantly higher than the extraction of global aspects and opinions.The existing LDA model tend to find the global aspect and opinion words, and the semantic relations among words are not considered in the probability distribution of topic words, which causes the precision and recall of extraction of aspects and opinions in low frequency and implicit semantic relations is not high, there are still many issues to be resolved as follows:(1) It is difficult to extract the low frequency aspect and opinion words. LDA model is good at finding the words with high-frequency, which leads to the low extraction rate of low-frequency words. In Chinese product reviews, the same aspect is often described by many different words, in which the low-frequency aspects are often ignored. Also the low-frequency exclusive opinions that only modify few aspects are not easily recognized by LDA.(2) It is difficult to find the relations between the low co-occurrence frequency aspects and opinions. LDA model is good at finding words and expressions with high co-occurrence frequency, but it is difficult to find some real existing but low frequency co-occurrence matching relations between aspect and opinion words. In Chinese product reviews, some opinion words only used to modify one or one kind of aspect,co-occurrence relations between this type of words is not explicit, which makes the LDA model is difficult to find this kind of relation. At the same time LDA is also very difficult to extract the matching relations in sentences only containing opinion words.(3) The interference of global aspect words is imposed on the topic allocation of local aspect words. The LDA model is sensitive to the high-frequency global aspect words, easy to assign global aspect words with higher probability to different topics,and influence the topic distribution of other local aspect words with relatively low frequency, which causes the repeated extraction of global aspects and low finding rate of local aspects.(4) It is difficult to identify the semantic relations between aspect words and opinion words. LDA model is a probabilistic generative model of bag of words, its words association extraction is mainly reflected in the co-occurrence of the document level, and it is hard to understand the semantic relations between words, which may cause the words with high document co-occurrence frequency but with no semantic relations words assign to the same topic, or the words with low co-occurrence frequency but with strong semantic relations assign to different topics, thus the topic words can't reflect the real semantic relations between aspects and opinions.In order to solve the above problems, to extract fine-grained aspect words and opinion words, it is necessary to guide the topic words mining using prior knowledge constraint, and form the supervision effect to extract the topic words conformed to mining target. Considering the lack of semantic comprehension in LDA, we explore relations between words from the finding of semantic relations firstly, and then use the related knowledge to form a binding mechanism on the topic model, which can find more implicit relations between aspect words and opinion words. The introduction of semantic relations between words can keep the topic words extraction function of LDA on large scale texts, at the same time, enhance the semantic understanding in LDA,which can improve the ability to identify the local relations between words, and extract more fine-grained aspects and opinions. The main research contents are summarized as follows:(1) Study on the semantic relations of Chinese product reviews. According to the features of Chinese product reviews, from the syntactic dependency, word meaning and context correlation we acquire semantic relations and transform it into easy identification and convenient embedding method for LDA model, and improve the topic model by using semantic constraint as prior knowledge.(2) Study on the embedding mechanism of semantic relations in LDA model. On the basis of retaining the topic words extraction of LDA model, how to embed the semantic constraint knowledge into the topic model is studied. The probability distribution relations of topic need reflect the semantic membership relations between different levels, and provide guidance for the semantic requirements of aspect and opinion words, to resolve the problem that the topic words extracted by LDA can't fully meet the semantic requirements.(3) Study on the hierarchical distribution of semantic relations in LDA model. Due to the introduction of semantic relations, the distribution relations between different levels of LDA model will be changed. Based on the hierarchy distribution relations of standard topic model, we add semantic prior knowledge to influence the probability distribution relations, which includes three relations between different levels: relations between document and sentiments, sentiment and topics, topic and words.(4) Study on the construction of semantic constraint topic model. The semantic constraint is introduced into the LDA model, and the weak supervision effect is formed.The overall structure of the existing LDA model will be changed, which is reflected in the structure change of levels and the membership change of different levels. According to. the different semantic task, based on the extended LDA model, it is combined with a variety of semantic constraints acquisition and embedding, and the construction of three kinds of topic models including aspects extraction, aspects and opinions extraction , and sentiment polarity classification is studied.The innovative work of this thesis is concluded as follows:(1) The method of obtaining the semantic relations of the words in the product reviews is proposed. According to the features of Chinese product reviews, we design the rules to find semantic relations between aspects and opinions considering syntactic analysis, word meaning understanding and context correction, and take the convenience of semantic relations as prior knowledge constraint adding to the LDA model, also the relations of words can reflect the semantic relevance between aspect words and opinion words in Chinese product reviews.(2) The constraint mechanism of semantic relations to LDA model is designed.Firstly, the design of semantic relations under the constraint of topic words distribution can realize the aggregation and distinction of fine-grained aspect words and opinions words in different topics. Secondly, the design the global aspects topic distribution can reduce the interference of the global aspects to local words distribution, and find more local aspects and opinions as possible. Semantic constraints can be used to guide the LDA to carry out the probability distribution of the topic-words, affect the degree of aggregation and separation of words in topics, and make up for the lack of semantic understanding of LDA.(3) Four semantic constrained LDA models are constructed. The original LDA model is extended, and the WC-LDA, AC-LDA, SRC-LDA and SWS-LDA models are proposed by the embedding of the semantic prior knowledge. Based on the features of LDA topic words extraction, the structure of LDA is improved, and the semantic relations between words is added to guide topic words mining, which makes the distribution more in line with the semantic demand of aspects and opinions extraction,also improves the recognition rate of aspects and opinions of low frequency implied in the sentence structure. The proposed models can increase the clustering degree of the topic words, and discover more fine-grained aspect words, opinion words and the relations between them.
Keywords/Search Tags:product reviews, sentiment analysis, topic model, semantic constraint, syntactic analysis, weak supervision
PDF Full Text Request
Related items