Font Size: a A A

Research On Technology Of Sentiment Analysis And Opinion Identification For Chinese Product Reviews

Posted on:2019-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:G WangFull Text:PDF
GTID:1368330578972557Subject:Software engineering
Abstract/Summary:PDF Full Text Request
More and more products are sold online and the number of customer reviews for the products is also increasing rapidly.On some websites,the number of reviews for a product can reach hundreds and thousands.These reviews are very valuable to potential customers,product manufacturers and product sellers.These reviews contain huge business opportunities.More and more researchers are also trying to analyzing buyers' opinions,evaluations,attitudes and emotions about purchased products and product characteristics from these product reviews.This is the sentiment analysis.It involves several research domains,such as information retrieval,natural language processing,and data mining.The main work of this thesis is as follows:(1)It proposes a method to identify the product features described in the Chinese product reviews based on the syntactic structure.The multi-strategy method is used to extract the product features in different levels appearing in the product reviews.It also completes the sentiment classification based on the product feature attributes.The main task is to solve two problems in sentiment analysis and opinion identification.One is the extraction of product features,and the other is the identification of sentiment directions based on product features.The task of product feature extraction is to extract product features by calculating the frequency of words in product reviews and implementing the double propagation algorithm based on syntax relationships.And the redundant features that affect the precision of the algorithm are removed by feature pruning.The sentiment orientation identification method based on product features can effectively identify the emotions of the same sentiment word in different sentences.It considers that the same word may express different opinions in different sentences without fixing the sentiment orientation of the words.Experiments show that the method proposed in the thesis can obtain higher precision,recall and F-score.(2)It proposes two methods to extract sentiment words by using different strategies under the influence of contextual factors,that is,sentiment word extraction based on distance and syntactic relationship.Through these two strategies,the sentiment words appearing in the customer product reviews are extracted,and then the opinion sentences appearing in the product reviews are identified,and the sentiment orientations expressed in the opinion sentences are determined.In this thesis,it compares the implementation effects of the two methods,predicts the sentiment polarity of the sentiment words and opinion sentences with the contextual relationship and environment,and experimentally verifies the validity of the methods.(3)It proposes a cross-domaiin topic and sentiment words extraction algorithm based on Conditional Random Fields model(CRF),namely CRF-CDOA algorithm.The Chinese syntactic rules are added to the Conditional Random Field model.The correlation degree of the source domain data and target domain data is continuously improved by iterative method.Then,Conditional Random Field model is trained with highly correlated data.CRF-CDOA algorithm is used to extract topic and sentiment words in different domains.Through our method,the data in the corpus can be identified without labeling the target domain data.At last,the effectiveness of the proposed CRF-CDOA algorithm is verified by experiments.(4)It proposes three fake review identification methods based on multi-dimensional feature engineering.Under the premise of adding product feature extraction and opinion sentence judgment,six feature parameters for identifying fake reviews are defined,and a fake review identification model based on multi-dimensional feature engineering is constructed.At the same time,the effectiveness of selected feature engineering is verified.Based on the multi-dimensional feature engineering model,it proposes three identification algorithm that is the multi-dimensional feature engineering identification algorithm based on the union relationship,the identification algorithm based on weighted multi-dimensional feature engineering scoring and the identification algorithm based on weighted multi-dimensional feature engineering classification.It compares the effects of three methods.Based on the multi-dimensional feature engineering fake reviews identification model,the fake reviews can be effectively filtered out.
Keywords/Search Tags:Opinion identification, Sentiment analysis, Feature extraction, Sentiment word extraction, Cross-domain sentiment analysis, Feature engineering, Fake review identification
PDF Full Text Request
Related items