At present,with the continuous updating of science and technology,the online shopping model has become increasingly mature,and the without leaving home consumption model that can satisfy the various needs for consumers,which has become an indispensable part of modern in people’s lives.This way of shopping according to the commodity information and a numerous of comments in the online mall has brought huge facility to people,but also stimulated the occurrence of a large number of bad phenomena.In order to make high profits,some businesses often employ brush hands or navy to write a batch of good reviews for their products,so as to mislead consumers to shop.This not only seriously affects the correct judgment of consumers,but also has a serious adverse impact on the ecological development of the consumer market.Therefore,how to effectively identify fake comments has a very important practical significance.In recent years,the scholars have got many better achievements on this issue,but the performance of the model needs to be further improved.The traditional machine learning algorithms cannot fully obtain the in-depth semantic information of the text,and the manual construction of features is very complicated,so leads to the ultimate classification ability of this kind of algorithm is relatively limited.On this basis,this article will focus on the use of deep learning algorithms to identify fake comments.The main research content can be summarized as follows:Firstly,Based on the traditional machine learning method to identify fake comments.This paper mainly uses the combination of N-gram and TF-IDF strategy to extract the information in the comment text,and constructs the corresponding behavior characteristics of the commentator manually to supplement some behavior information that is not reflected in the text.Put the two combination features into the traditional machine learning model.The effect of the model is better than that of the single classifier model after combining the text features and the behavior features of the commentator.The relatively good performance is the logistic regression model,which’s accuracy rate reaches 78.33%.Secondly,Based on deep learning method to identify fake comments.In the construction of text features,we choose the word vector model to improve it,such as:use Google pre-trained Word2 vec and Glove word vector to train,and then get the corresponding text word vector.After fusing with its quantized commentator behavior features,we input them into CNN,Bi LSTM and Bi GRU for training,and try to add Attention mechanism,mixed mode and other methods do the experiment.The results demonstrates that the CNN+Bi GRU+Attention hybrid model has the best effect,and the accuracy is improved by 5-8% compared with the single method,and finally reaches 90.13%.Finally,Introduce ERNIE pre-training word vectors to build a deep learning optimization model.Based on the deep learning model,this paper trying to probe the effect of the pre-training model on the downstream classification model,and selects the Ernie pre-training word vector model of Baidu open source for training,and selects Bi GRU+Attention model in the downstream task for the classification task.This method can effectively enhance the semantic expression ability,and further improve the performance of the model,so that its accuracy rate reaches 92.57%,which achieves the best effect compared to other models. |