Font Size: a A A

Research On Deceptive Review Detection Based On Multi-view Learning

Posted on:2019-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q YeFull Text:PDF
GTID:2428330590465914Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of various network communication technologies,the Internet is increasingly influencing and changing the life of people.The main consumption mode is also changing gradually from offline to online.For its convenience and speed,online shopping has become the first choice of many consumers.Before making a purchase decision,users often tend to make a judgement by referring to the reviews made by others of the targeted goods or services.Online reviews contain rich views,which make great use for review users,authentic comments can help consumers make the right decision of purchase,and also provide an important way for business communities to obtain the real needs and feedbacks of users,so it is of great significance to identify and filter deceptive online reviews by measuring their credibility.This paper analyzed and summarized the recognition framework and technology methods of deceptive review detection.Aiming at the main problems existing in the current researches in feature extraction and fusion,and annotation data set missing,multi-view learning methods are proposed to solve them.The main contents of this paper are as follows:1.For the problem in short of labelled samples,a semi-supervised co-training algorithm is used to classify reviews in order to reduce the workload of manual annotation.On the two views of review texts and behaviors,a set from which the features are extracted by PCA is established.After picking the base classifiers according to the characteristics of each view,the features which influence the result most are submitted to train the classifiers.Experimental result shows that the cotraining algorithm can make full use of unlabeled samples to help model training,and partly make up for the limitations caused by the lack of annotation samples.2.Combined with the commonly used feature indicators in previous studies,a more complete index system of evaluation of review credibility is constructed by analyzing the differences between deceptive reviews and real ones,and refining from the views of text and behavior.To solve the problems such as feature redundancy and high dimension etc.caused by directly connecting features from different views into a single one,we firstly use CCA is to map the features of every view to the shared low dimensional subspace,then combine different views by two feature fusion strategies.Finally,through comparative experiments,the effectiveness of the selected features and proposed methods are proved.3.Considering the poor performance of the base classifier in the initial stage of co-training process,the erroneous labeled samples can be added to the training set.As the iterative training continuing,this kind of error will gradually accumulate and affect the final performance of model.To solve this problem,a sample label similarity strategy is used to further evaluate the confidence,so as to reduce the introduction of noise samples.Experiments show that co-training algorithm combined with sample similarity strategy is better than classical co-training algorithm in terms of overall classification accuracy and F1.
Keywords/Search Tags:deceptive review, multi-view learning, co-training, canonical correlation analysis
PDF Full Text Request
Related items