Font Size: a A A

Research On The Identification Method Of Books False Comments

Posted on:2020-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:J M YinFull Text:PDF
GTID:2428330575967956Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet today,online shopping is the main way for people to consume.In the absence of physical objects,the comments of articles have become the main basis for people's consumption.So in the field of book reading,the authenticity of book reviews is of great significance to people's choices and adjustment of merchant marketing strategies.With the development of text analysis technology,identifying false comments from a large number of comments has become a hot issue in current research.This paper collected more than 40,000 books and more than 7 million real reviews of Douban as a data set,and then conducted false comment recognition research for two types of algorithms.The first type uses a hybrid feature-based supervised algorithm to identify false reviews,which are divided into a full-supervised learning algorithm and a semi-supervised learning algorithm.First,the Douban book review data set is pre-built.Then use the naive Bayesian classification algorithm for full-supervised learning to identify false comments on Douban books.Through the comparative analysis of the experimental results,the conclusion that the recognition effect based on the mixed depth syntactic features of the lexical features of the unary grammar is obtained is the best.However,since the above algorithm does not use unlabeled book comment information,the unmarked comment text will be used on the basis of the tag data set.The reviewer's feature is used to identify the false comment based on the semi-supervised learning-based Co-training dual view algorithm.The first view identifies whether the comment text is a false comment by commenting on the relevant features of the text,and then uses the optimal experimental results of the fully supervised learning algorithm to perform feature modeling;the other view uses the relevant features of the review text to identify false comments.If the author of the review text is a false review writer,then this comment is more likely to be a false comment.Then,through the comparative analysis of the experimental results,we can see that the semi-supervised learning Co-training algorithm can obtain better recognition results based on the mixed syntactic features of the lexical lexical features.However,manually annotating data is time consuming and labor intensive,and the data sets that are labeled generally have subjective defects.By analyzing the above algorithms and data sets,it can be clearly found that both recognition algorithms use a large number of manually labeled data sets,and at the same time,the data collection information is not enough,and the algorithm recognition effect is not up to expectations.The second type of algorithm studied in this paper In order to solve these problems,the paper uses the Douban book feature and the custom false comment dictionary algorithm to identify the false comments of the book.The algorithm makes full use of the review text information,the book information feature and the weight ratio filtering model to detect and detect the false comments of the Douban books.By comparing and analyzing the three experimental results,we can see that the hybrid algorithm of the proposed Douban book feature and the custom false comment dictionary can use the unlabeled data set more effectively,and the effect of identifying false reviews of books is better,and the algorithm can be extended.Stronger.
Keywords/Search Tags:false comment detection, feature recognition, supervised learning, custom dictionary, weight scale model
PDF Full Text Request
Related items