Font Size: a A A

Research On Detection Methods Of Fake Reviews Based On Semi-supervised Learning

Posted on:2021-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhuFull Text:PDF
GTID:2428330647952838Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As information technology and network platforms develop rapidly,many users like to shop online through e-commerce platforms.Because online shopping prevents users from directly contacting the commodity entities,and the merchandise information provided by merchants is not completely credible,users' product reviews have great reference value for other users and enterprises.This has led to the creation of fake reviews,which deliberately praise or maliciously defame a product,and are not consistent with the comments released after the user's real experience.These fake reviews are extremely harmful and seriously damage the interests of users and normal businesses.Therefore,it is an urgent task to identify fake reviews effectively.Existing techniques of the fake review detection are often based on a large amount of tagged data,using supervised learning approaches.However,manually tagging data is time-consuming and labor-intensive,and it is not desirable to mark comments on a large scale in this real scene.Therefore,it is necessary to study how to use less tagged data to detect fake comments.In view of the above situation,this thesis conducts the research on fake reviews detection in terms of feature extraction and model training algorithms.The work done is as follows:(1)Investigating the fake reviews detection model from the aspect of feature extraction,a user behavior feature based on the user's browsing records--product browsing relevance(PBR)is proposed,according to the scene where the user actually purchases a product.The browsing records both on a product and on the products similar to the previous product are used to describe the authenticity of user purchases.Experimental results verify that user behavior(including browsing relevance)and text features have better effect on detection through the Amazon experimental data set.(2)From the perspective of model training algorithm of fake reviews detection,this article proposes an improved vertical ensemble Tri-training algorithm(VETT).The algorithm mainly saves the classifiers in each iteration,and uses the diversity among the classification models of the previous several iterations of the classifier to train the first-generation classifiers of this iteration.This method reuses the classifiers in the iteration process without adding too much time and space overhead.The experimental results show that the VETT algorithm has a better effect on detecting false comments.(3)Considering that the improved algorithm still has weak performance of initial classifiers and the diversity of classifiers iterations is not obvious,this article uses active learning based on the committee to solve the problem.In the iteration of Tri-training,active learning is used to select the samples with the greatest uncertainty and difference for labeling,in order to improve the performance of detection model.Amazon data set and gold data set are selected for experiments,which proves that active learning improves the performance of semi-supervised learning algorithm to a certain extent.
Keywords/Search Tags:Fake review, Product browsing relevance, Tri-training, Vertical ensemble, Active learning
PDF Full Text Request
Related items