Research On Key Techniques Of Opinion Mining For Chinese Web Reviews

Posted on:2014-02-10

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F Li

Full Text:PDF

GTID:1228330398490216

Subject:Education Technology

Abstract/Summary:

PDF Full Text Request

The Web has stored huge number of comments about social events and focused personages, as well as the products. These comments have important application value in providing service for the government, the manufacturers and the consumers. However, the amount of web information is increasing in exponential order, and we are faced the difficult of information overload. Information obtaining in a manual way is extremely time and effort consuming. There is an urgent need for an effective means of data collating, analyzing and extracting, which is expected to provide valuable, clear and comprehensive information to the user. Therefore, the technology of opinion mining emerges and has attracted more and more attentions. It has become a hot research area in data mining and natural language processing.This paper focuses on aspect-based opinion mining from product reviews in Chinese, including aspect and relation extraction and sentimental words identification. Firstly, it extracts the aspects and their hierarchical relations from review corpus with topic model. Then, it distinguishes sentimental words as context-free words and context-dependent words, and identifies them with the methods of word explanations based and association rules based respectively. Lastly, it counts the results according to the aspects, and displays them in a hierarchy. The research work and innovation are as follows.(1) This paper proposes a review-topic model (RTM) for aspect and hierarchical relation extraction. The model extends the Latent Dirichlet Allocation by adding a review indicator layer between the document and topic layer. It represents each document with a mixture of review indicators, each indicator is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. The basic idea of RTM is that the review indicators can be effective for the generation of words in the documents. After parameter estimation, the related aspects are assigned to one topic, and the hierarchy of indicators, topics and aspects could be obtained with the distributions of indicator-topic and topic-word. Experimental results show that the precision, recall, and F-measure of aspect extraction are8.6%,3%,7%over the LDA model. Besides, the RTM model can obtain the hierarchical relations between aspects.(2) Adding prior knowledge of word distribution into topic model can improve its performance. This paper researches how to incorporate prior knowledge into the RTM model, and presents a review-topic model with Dirichlet Forest prior (RTM-DF). The RTM-DF model extends the RTM model by replacing the Dirichlet prior over the topic-word multinomial with the Dirichlet Forest prior, which can incorporate word correlation. Firstly, the correlation between words is calculated to generate a set of Must-Links and Cannot-Links. Then the structures of Dirichlet trees are obtained through encoding the constraints of Must-Links and Cannot-Links. Words under the same subtree are expected to be more correlated than words under different subtrees. Lastly, each topic is assigned a tree by the Dirichlet Forest distribution, and the topic-word multinomial is sampled conditioned on these trees. After parameter estimation, the distribuitons of indicator-topic and topic-word are used for aspect and hierarchical relation extraction. We conduct experiments on a synthetic dataset and a review dataset. Both of the experimental results show that the RTM-DF model performs much better than the RTM model. It can improve the precision and F-measure of aspect extraction by5%,3.7%respectively.(3) This paper proposes a noun phrase extracting method which is based on rule and co-occurrence probability. After words segmentation and part of speech tagging, word combination rules are used to extract candidate noun phrases. Then, the co-occurrence probabilities between words are utilized for filtering out noisy phrases.(4) This paper proposes a context-free sentimental words identifying method based on word explanations. Firstly, candidate sentimental vocabulary is built according to the existing emotional resources. Then, for each word in the vocabulary, all the explanations in Modern Chinese Dictionary are extracted, and a multi-feature fusion method is used to calculate the orientation of the explanations. And the results are used to identifying the context-free sentimental words with a strategy of multiple cycles. As a result, a context-free sentimental dictionary is built, and it is applicable to any field.(5) This paper proposes a context-dependent sentimental collocative phrase mining method based on association rules. The common collocative phrases are identified from the corpus with the technique of association rules. Then, their orientations are calculated according to the context. As a result, a context-dependent sentimental collocative phrase collection is built. The experiments are conducted on the test collections of COAE2011. The results show that the effect of sentimental words identification has been significantly improved.

Keywords/Search Tags:

Opinion Mining, Aspect, Sentimental Words, Topic Model, Dinchlet Forest, Association Rules

PDF Full Text Request

Related items

1	Research On Key Techniques Of Opinion Mining For Entity
2	Aspect And Sentiment Extraction Based On Semi-supervised Topic Model
3	Product Aspect Extraction Supervised With Domain Knowledge
4	Research On Hot Topic Tracking And Relationship Detection Based On Parallel Association Rules
5	Research On The Method Of Extracting Opinions Based On Product Reviews
6	An End-to-End Opinion Mining Model With Weak Supervision
7	Research On Mining Product Features And Opinion Words For Web Reviews
8	Research On Model Of Hot Topic Opinion Mining In Virtual Communities
9	Social Media-Oriented College Internet Public Opinion Analysis System
10	Research On Medical Image Classification Based On Bag-of-Words Model And Association Rules