Font Size: a A A

Research On User Malicious Comments Detection Based On Machine Learning

Posted on:2020-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:C Y HeFull Text:PDF
GTID:2428330572473619Subject:Cryptography
Abstract/Summary:PDF Full Text Request
With the popularity of the mobile Internet,people can express their opinions in the Internet anytime and anywhere.On the one hand,many media companies need users to actively comments,and on the other hand,many comments are mixed with malicious comments.These comments not only hurt others mentally,but also makes the whole Internet environment become chaotic.More importantly,the attacked will gradually use other products,which is not conducive to the development of the company.Managers need to filter out malicious comments,but small companies can't afford the cost of manual review.Therefore,it is necessary to design a malicious comment automatic review scheme.To solve the above problems,this thesis proposes a malicious comment review scheme based on machine learning.The results are as follows:Firstly,based on the research of "swearwords—curse[li]" by Chinese linguistics,this thesis select 40 seed words,and then obtain a malicious dictionary by extending the algorithm.Compared with manual dictionary selection,our method saves a lot of labor costs.In addition,the dictionary can also be used as a custom dictionary for Chinese word segmentation to improve the accuracy of word segmentation.Secondly,the thesis analyzes the news topic of each user's historical comments.We use LDA model to extract the theme of news content,and use "user id","user comment" and "news content of comment" as the input of the RNN model.Experiments show that the improved scheme improves the detection effect of malicious comments.Finally,the results of the previous two parts are combined with the characteristics of traditional malicious comment system.The thesis extracts 13 types of features from the dataset,calculates Pearson correlation coefficients and analyzes features.At the same time,features are used as input of decision tree algorithm and SVM algorithm respectively,and a network malicious comment detection model is designed.Experimental results show that the decision tree algorithm achieves the optimal effect,and the classification result F1 value reaches 92.87%.Therefore,the scheme designed in this thesis can help users to complete the comment detection.
Keywords/Search Tags:Chinese linguistics, Malicious dictionary, LDA model, Machine learning
PDF Full Text Request
Related items