Font Size: a A A

Feature Analysis And Detection Of Review Spam Based On WEB Quality Feature Model

Posted on:2018-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:X T LiuFull Text:PDF
GTID:2348330521950783Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Growing e-commerce makes online shopping popular,customer reviews,the most important customer feedback, has a large and explosively increasing scale. For fairness and interaction, e-commerce platforms usually make the reviews public, so that, besides helping manufactures improve their products and service, the reviews can be good references to those potential buyers. Good rated products attract more buyers,otherwise sales would be worse. Based on this, some unscrupulous merchants could conduct deceptive positive commence to raise their own reputation or deceptive negative commence to frame their competitors.This thesis focus on the differences between spam reviews and truthful reviews,feature analysis is done from multi-dimensions which is inspired by Web Quality Model (WebQM).In this thesis, we extraction 3-dimension features which are from review source, review content,and review expression. Based on these high-discriminability features,we provide 2 different algorithms to achieve review spam detection.Two true data sets are used. For the gold-standard dataset,we focus on the differences between truthful reviews and spam reviews. Based on this, feature extraction is done from the review content and review expression, We proposed a modified PU-learning and make it used in the detection of review spam. The obtained results show that the proposed PU-learning method outperformed the original machine learning approaches, and achieves 86% F1 results.For the Amazon dataset, we labeled the data using Simhash and construction the experiment dataset with 3 thousand reviews. Based on the special properties of Amazon data,we extract the review source features and enlarge the review content and review expression features. Based on this,we used the gradient boosting decision tree (GBDT) algorithm to Amazon review spam detection and verified the feasibility of this algorithm, and achieved 88%F1 results finally.
Keywords/Search Tags:review spam detection, multi-dimension features, PU-learning, GBDT
PDF Full Text Request
Related items