Font Size: a A A

Research On Identifying Review Spam For Product Reviews

Posted on:2013-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:L J LiuFull Text:PDF
GTID:2298330362464321Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of the Internet, the way of expression andcommunication of people has also changed. In the field of product reviews, People are moreinclined to express themselves on such online shopping. Those expressions of the users arerich in varied and useful information. Meanwhile those expressions may also include somespam information. The spam information has affected the quality of the product reviewsmining.This paper comes up with an identification way of the spam in the Chinese productreviews. The main works are as follows:First, based on the analysis of spam reviews in the Chinese product reviews, spamreviews are classified into useless reviews and untruthful reviews. Different methods ofdetection are proposed according to their features.As to the detection of useless reviews, this paper takes it as binary classification problem.We use four important classification features such as product features, assessing phrases aboutnon-product information, questions and hyperlinks to characterize reviews, meanwhile we useinformation gain method to extract some features automatically to characterize reviewstogether with the other four features. At last, convert each review express into the featurevector format composed by these feature values, and then adopt the classification methodbased on the Logistic regression to classify the reviews into normal reviews and uselessreviews, which finish the detection of the useless reviews.As to the detection of untruthful reviews,2-gram model is used to express the reviewtexts with the consideration of the word order, in order to avoid the situation that theprobability value is zero when constructing the2-gram model, the Katz smoothing method isadopted to smooth the model, and lastly the KL divergence is added to detect the untruthfulreviews. If the value of KL divergence is less than a given threshold, we argue that the reviewis not true.The experiments results has illustrated that those methods put forward in this paper caneffectively identify useless reviews and untruthful reviews exist in the product reviews.
Keywords/Search Tags:Spam Detection, Logistic Regression, 2-gram Model, Katz Smoothing, KL divergence
PDF Full Text Request
Related items