Font Size: a A A

Research And Implementation Of Reviews Visualization System Based On Fake Reviews Detection

Posted on:2019-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhongFull Text:PDF
GTID:2428330566976621Subject:Engineering
Abstract/Summary:PDF Full Text Request
Consumers extensively use user-generated online reviews as an important reference when making consumption decisions.While the growing fake reviews not only interfere with the accuracy of reviews information,but also affect the fairness of commerce transactions.It is inefficient to detect fake reviews manually from large-scale reviews,and the accuracy rates typically near chance.Solving the problem with technological means is becoming a research hotspot.This thesis provides the overview of fake reviews detection and analyses the advantages and limitations of different detecting algorithms.It also concludes the Existing research difficulties.This thesis mainly focuses the studies on the two aspects: fake reviews detection and reviews visualization.The research is mainly conducted from the following three aspects:(1)The dataset of user-generated reviews containing fake reviews is a classimbalanced dataset.If the imbalanced dataset is used in supervised learning based algorithms,more emphasis may be focused on the majority class.Therefore,the detection result of fake reviews,as the minority,might be unsatisfying.To overcome the class imbalance problem,Adaptive Synthetic Sampling Approach(ADASYN)is utilized to preprocess the training set.The text feature presented by paragraph vector,rating deviation and reviewers' activeness etc.are input into SVM model to detect fake reviews.Experiments show that samples processed by ADASYN perform better in detection than samples processed by random over sampling,random under sampling and the original imbalanced samples.(2)The anomaly of time series of reviews sentiment is proposed as a new feature to detect fake reviews.Sentiment analysis based on sentiment lexicon is used to calculate the sentiment score of reviews.To improve the accuracy of the sentiment score,a restaurant-domain sentiment lexicon is constructed by the similarity between word vectors and WordSentiNet lexicon.The anomaly detection of time series based on residual statistics method is used to detect the time points of abnormal fluctuation in reviews sentiment time series.Finally,the result is inputted into the model as an additional feature.The experiment shows that this feature(3)A review visualization system is designed and implemented.This system could extract reviews and metadata popular restaurant in NYC from Yelp,then filter the fake reviews based on the method proposed by this thesis.Through visual mapping,data zoom and word cloud etc.technology,the real reviews and metadata is visualized interactively from the following five dimensions: the sentiment trend,popularity,highfrequency adjectives,high-frequency nouns and ratings.Visualization transformed the disorganized reviews information into brief visual output.This provides intuitive and reliable reference for users to make dining decisions immediately.
Keywords/Search Tags:Fake Reviews, Sentiment Analysis, Anomaly Detection in Time Series, Imbalanced Dataset, Visualization
PDF Full Text Request
Related items