| In recent years,with the rapid development of new retail and mobile network payment,people’s demand for the diversity of consumption channels is more and more vigorous,and online shopping has become an important way for people to consume and shop.At the same time,online shopping is producing a lot of online comments,the evaluation of a large impact on sales to the businessman,because the Internet has the characteristics of highly open,there are a lot of businesses begin to pay close attention to the consumer online shopping,online comments information generated under the drive of interests,the business behind the phenomenon of comments,began to appear false comments online,serious harm the rights of consumers.Therefore,whether from the perspective of individual consumers or from the perspective of businesses and platforms,it is an urgent task to identify fake review information.However,the daily increment of online comments is huge,and it will cost a lot of manpower and material resources to review comments manually.Therefore,a set of effective identification method is needed to automatically screen online comments and effectively eliminate false comments.The purpose of this thesis is to provide a set of identification methods for fake comments on e-commerce platforms,to identify fake comments quickly,accurately and effectively,and to summarize the behavior patterns of fake comments.This thesis mainly realizes the identification of false comments based on the method of text mining.The main work includes:Firstly,the relevant comment data were obtained from the ecommerce platform,and the text data were cleaned.The text data with good words were subjected to appropriate vectorization operation,and the data were annotated by comment release time,repeated comments,and the emotional tendency of the commenters.Secondly,the vectorized text is trained with the traditional machine learning and deep learning methods,and each base classifier is combined with the ensemble learning method to recognize the text comment data.The trained model is used to predict the data of a large number of unlabeled comments,and the behavior attributes and behavior patterns of fake comments are mined through language model of fake comments,topic difference analysis,part of speech analysis,sentiment analysis and other work.The experimental results show that this method has a good performance in the identification of false comments,which provides a new method for consumers and regulators,and has a certain practical application value. |