Font Size: a A A

Research And Implementation Of Spam Filtering Method Based On Deep Learning

Posted on:2022-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y FuFull Text:PDF
GTID:2518306602466854Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,a large amount of text data is generated and disseminated on the Internet.Some criminals used the openness of social networking platforms to spread sensitive,pornographic,fraudulent,vulgar and other spam information,which had a great impact on the network environment.Although there are many filtering methods for spam,due to the randomness and informality of the Internet language,criminals use variant words to replace keywords in order to avoid censorship,resulting in very limited filtering methods.As well as spam marketing diversion advertisements flooding online social platforms,it is urgent to accurately identify them.Therefore,the filtering method of spam needs to be improved to further purify the network environment.This article briefly analyzes the development of spam text filtering technology,studies and summarizes the variant word characteristics of social platform spam text and the text characteristics of marketing diversion ads,and analyzes the reasons why the current spam text can evade the filtering technology.An improved text filtering method based on the characteristics of current spam text is proposed.Aiming at the problem of word segmentation errors caused by variant words and difficult to accurately represent,a convolutional neural network based on word granularity is built to avoid the impact of word segmentation errors.Experiments have verified that the word granularity vector input is superior to word vectors on the task of spam text filtering.Faced with the problem of longer text representations using word granularity vectors,and the simple expansion of the convolution kernel leads to the exponential increase of model parameters,this paper uses dilated convolution to increase the receptive field of the convolution kernel,reduce model parameters,and reduce model parameters.the complexity.By adding an attention mechanism,more weight is given to the key features of the text,and the influence of normal text mixed in the spam text on the classification result is reduced.On this basis,using knowledge distillation technology,the knowledge of the Bert model,which is not suitable for low-time-consuming scenarios on social platforms due to too many model parameters and too slow processing speed,is transferred to the CNN-DA model.It is verified through experiments that the accuracy of the distilled CNN-DA model is increased by 2.4%,which basically achieves the same effect as Bert,and the processing speed is much faster than that of the Bert model.Without reducing the filtering speed,the model filtering effect is greatly improved.Use crawlers to grab text information on social platforms,and build spam text data sets through regular filtering and manual filtering.Based on the constructed data set and the spam filtering method,the hyperparameters are adjusted in the model training to find the optimal hyperparameters of the model.Design comparative experiments to verify the feasibility of the model.The experimental results show that,in the task of spam filtering,the two performance indicators of filtering accuracy and filtering speed are integrated,and the constructed model performs best.It has obvious advantages in the spam filtering of social platforms with low latency requirements.Finally,based on the proposed spam filtering method,a spam filtering system is designed and implemented.The design interface provides text filtering services for platform users.
Keywords/Search Tags:Spam Filtering, Deep Learning, Variant Words, Dilated Convolution, Knowledge Distillation
PDF Full Text Request
Related items