Font Size: a A A

Recognition Method Of Microblog Spam Comment Based On CNN

Posted on:2021-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y C WangFull Text:PDF
GTID:2428330602465435Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the wave of new media development,Microblog has become one of the largest social media platforms in China.People are used to getting information,exchanging emotions and expressing opinions through microblog in their daily lives,making microblog an instant message publishing and social networking Multi-functional and important comprehensive platform such as interaction,news report and public opinion guidance.It is precisely because microblog has the characteristics of convenience,authenticity,extensiveness,and immediacy.Some microblog users have published a large number of spam comments under popular microblog for different purposes.The emergence of these spam comments not only makes Internet users The communication is blocked,and some netizens are deceived and even hindered the work of comment-oriented data miners,so it is of great significance for the identification of spam comments.This article studies the related technical points of microblog spam classification,based on the convolutional neural network model and the text representation method combined with word vectors.The relevant research results obtained are as follows:1.At present,the methods of obtaining Weibo data are mainly web crawlers and microblog open platform API.This paper proposes a method based on cookies and regular expressions to obtain microblog data.Experiments show that this method is not only simple to operate,but also has fast data acquisition speed.2.For the basic principles and characteristics of the current word vector model Word2 vec,although it considers the correspondence and similarity between words,it ignores the particularity of local word ordering in the context,which in some cases will lead to text semantic Missing and distorted.A word vector model based on convolutional neural network text classification is proposed,that is,based on the Word2 vec word vector model combinedwith N-Gram features,the extracted word vector(Word2vec-NG vector)is used as the convolutional neural network model.Enter.After several sets of comparative experimental analysis,the method proposed in this paper proves that the effect of text classification has been effectively improved through three evaluation indicators: precision,recall,and F1 value.3.In view of the advantages and disadvantages of support vector machine and convolutional neural network,this paper proposes to use convolutional neural network for feature extraction,use support vector machine for classification,and combine CNN and SVM to improve the classification effect.Through experiments on the Weibo review data set,the method in this paper is compared with several other typical methods.The performance of the CNN-SVM model is better than other algorithms,which not only runs faster,but also has higher recognition accuracy.
Keywords/Search Tags:spam comments on microblog, convolutional neural network, word vector, text classification
PDF Full Text Request
Related items