| With the development of e-commerce,more and more consumers choose online shopping.Consumers usually refer to purchased customers' evaluation of the product before making purchase decisions.The consumer's reliance on product reviews has led to the emergence of review spam,Some businesses are motivated by interests,and they begin to hire people to praise their products or maliciously slander their competitors.Fake reviews will not only affect consumers' online shopping experience,but also damage some businesses' reputation.As a preprocessing technology of opinion mining,sentiment analysis and recommendation system,review spam detection has become a research hotspot in recent years.It has profound significance in academic research and practical application.Most of the existing researchs in the field of review spam detection are using machine learning method to transform review spam detection into classification problems.The research objects mainly focus on the characteristics of the review text,the behavior of the reviewers and the characteristics of the products.Such methods rely on heavy feature engineering,and the performance on real data sets is also limited.In recent years,review spam detection based on time series has begun to emerge and has achieved good performance.In addition,most of the existing studies are oriented to single site and monolingual materials.Therefore,this paper considers the use of time series based on cross-website,cross corpus review spam detection method to carry out the research.The main contents are as follows:First of all,we systematically summarizes the development status of review spam detection.Based on the clues of review spam detection's objects,it analyzes the features and algorithms used in the field,summarizes the advantages and disadvantages of these methods.Then we summarize the commonly used datasets in this field,and lay a foundation for the selection of experimental data sets in subsequent research.Combined with previous research and blind spots,this paper proposes a new idea of the cross-website,cross corpus to solve review spam groups' camouflage fraud in a single site.Then,after analyzing the related technologies of time series,cross-website detection and review graph models,a cross-website review spam detection model based on time series was proposed.For the same product,the time series of product reviews are constructed on the Chinese and English websites respectively,and the preprocessing is completed.Then we do burst review detection on single time series and cross site time series,and show the characteristics of the time series in the form of suspected time interval.Finally,we integrate the features of the comment text and features of the time series as the general input characteristics of the model.Experimental results show that the effectiveness of the proposed model is increased by 14% over the traditional algorithm based on the internal features of the comment text.Finally,due to the critical influence of the integration of external features of reviews on the performance of review spam detection,inspired by the review graph model,this article abstracts the relationship between the three by abstracting the interrelationships between credibility of reviews,reviewers,and stores,thereby constructing a scoring model for calculating reviews,reviewers,and shop credibility.Experimental results show that the performance of the proposed model for review spam detection is 1.1% higher than traditional review graph model. |