Font Size: a A A

Research On Review Spam Detection Approach Based On Topic And Sentiment Analysis

Posted on:2018-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:X H JinFull Text:PDF
GTID:2428330596454796Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of e-commerce,online shopping has been accepted by more and more consumers.And most of the consumers will make reviews to their purchased products,and these reviews provide decision-making references for other consumers.Not all reviews are real and reliable due to some interest relationship.Fake reviews will influence the consumers' shopping experience or even lead to inaccurate purchase decision of consumers.These fake reviews are called as review spam uniformly.Review spam can be roughly divided into two types,namely content-type review spam and cheat-type review spam.Content-type review spam generally refers to irrelevant information like advertisement,spam webpage link and random words with the main purpose of message promotion.While cheat-type review spams are reviews about intentional product boosting and defaming,with main purpose of influencing the consumers' potential purchasing behaviors.The thesis will analyze the manifestation form of two types of review spam,adopt different methods to extract features,and finally use supervised learning algoriths to complete identification of review spam by intergrating the multi-dimensional features.The main work is as following:(1)To analyze the characteristic of content-type review spam,topic matching model is designed to extract topic information architecture features of review.By this method,many comment texts related to the commodity are collected firstly,and then their topic information is extracted to construct the commodity corpus.Hereafter,each topic word is given different weight,and then topic information of user reviews is extracted to process the matching calculation with the corpus.In addition,the “topic coincidence” of reviews is reached.Finally multi-dimensional features extracted from comment content and user behavior are combined and the approach of supervised learning is adopted to identify review spam.(2)To analyze the characteristics of cheat-type review spam,sentiment features of customer review are extracted on the basis of sentiment analysis.By this method,deep learning technology is utilized to realize sentiment feature extraction of all reviews.Sentiment tendency degree of each review is analyzed and the sentiment deviation of the same type sentiment comment is calculated respectively after obtaining positive and negative tendency of comment sentiment.Three-dimensional sentiment features are totally extracted by using the results of sentiment classification.They are abnormality of review sentiment,consistency of user rating and review sentiment,and sentiment complexity of all users' reviews.And then multi-dimensional non-sentiment features are combined with the method of supervised learning to realize identification of review spam.(3)To intergrate the above extracted multi-dimensional features and traditional textual features to complete dichotomy of reviews,multiple classifier ensemble are used.According to the disadvantages of random subspace method when dealing with low feature dimension,a random subspace method based on rule feature extraction is proposed to change the traditional method of constructing feature subset by randomly extracting feature set.Through making extraction rules,the accuracy of each member classifier is guaranteed and finally identification effects of three different classifier ensemble methods to review spam are compared.
Keywords/Search Tags:review spam, topic model, sentiment analysis, classifier ensemble, supervised learning
PDF Full Text Request
Related items