Research On Identifying Review Spam For Product Reviews

Posted on:2013-12-29

Degree:Master

Type:Thesis

Country:China

Candidate:L J Liu

Full Text:PDF

GTID:2298330362464321

Subject:Computer application technology

Abstract/Summary:

In recent years, with the rapid development of the Internet, the way of expression andcommunication of people has also changed. In the field of product reviews, People are moreinclined to express themselves on such online shopping. Those expressions of the users arerich in varied and useful information. Meanwhile those expressions may also include somespam information. The spam information has affected the quality of the product reviewsmining.This paper comes up with an identification way of the spam in the Chinese productreviews. The main works are as follows:First, based on the analysis of spam reviews in the Chinese product reviews, spamreviews are classified into useless reviews and untruthful reviews. Different methods ofdetection are proposed according to their features.As to the detection of useless reviews, this paper takes it as binary classification problem.We use four important classification features such as product features, assessing phrases aboutnon-product information, questions and hyperlinks to characterize reviews, meanwhile we useinformation gain method to extract some features automatically to characterize reviewstogether with the other four features. At last, convert each review express into the featurevector format composed by these feature values, and then adopt the classification methodbased on the Logistic regression to classify the reviews into normal reviews and uselessreviews, which finish the detection of the useless reviews.As to the detection of untruthful reviews,2-gram model is used to express the reviewtexts with the consideration of the word order, in order to avoid the situation that theprobability value is zero when constructing the2-gram model, the Katz smoothing method isadopted to smooth the model, and lastly the KL divergence is added to detect the untruthfulreviews. If the value of KL divergence is less than a given threshold, we argue that the reviewis not true.The experiments results has illustrated that those methods put forward in this paper caneffectively identify useless reviews and untruthful reviews exist in the product reviews.

Keywords/Search Tags:

Spam Detection, Logistic Regression, 2-gram Model, Katz Smoothing, KL divergence

Related items

1	Research On Identifying Review Spam For Product Reviews Based On Data Mining
2	Review Spam Detection Based On User Evaluation
3	Research On The Prediction Of Insurance Payment Based On Logistic Regression Model
4	Design Of Wrist Fall Detection System Based On LDA And Logistic Regression Algorithm
5	Research And Application Of Recommendation Technology Based On Logistic Regression
6	Research On Key Issues Of Spam Detection And Filtration
7	The Study Of A Prediction Method For Search Ad CTR Based On Logistic Regression Model
8	Research And Implementation Of The Anti-spam System Based On Bayesian Algorithm
9	Fuse Multi-features To Identify Product Review Spam
10	Research On Theory Of Spam Filtering And Its Key Techniques