Font Size: a A A

Spam Text Classification Method Based On Deep Learning

Posted on:2019-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LiFull Text:PDF
GTID:2348330545491868Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of e-mail,e-mail has become the carrier of commercial advertisements,malware and illegal files.The amount of spam that people receive on average is far more than that of normal mail,which seriously affects people's life and network security.How to accurately identify the spam has become an urgent problem to be solved.At present,the commonly used spam recognition methods generally include two types,which are mail based source recognition technology and mail based content recognition technology,such as whitelist and blacklist mechanism,keyword matching and naive Bayes text recognition model.With the increasing number and style of e-mail,the key words of spam are also changing dramatically.All rules based recognition methods need to update the feature library regularly,which will cost a lot of manpower.The content based recognition method has been effective,but the performance of the traditional naive Bayesian model in text classification is lagging behind the deep learning model.This paper is based on the use of deep learning of text content to classify spam recognition.Specific research work and contributions include:(1)The application of deep learning model in text classification is analyzed,and a deep learning model(Conv-BiGRU Model)based on convolution neural network and recurrent neural network is proposed.This model,which can extract local features,can also extract the features of the front and back words,and combines the advantages of the convolution neural network and the recurrent neural network.Experiments show that the new model improves the correct rate of spam classification;(2)The Stacking model based on deep learning is improved.The probability output of the first layer model is changed to the output of the final full connection layer of the deep learning model,and the feature expression of the second layer model is increased.And the spam text classification system has been completed.Compared with multiple models,the Stacking model improves classification performance.(3)In this paper,the Conv-BiGRU model and the Stacking model based on deep learning has been completed.According to different model methods,embedding layer input,model improvement and deep learning model parameters,a number of comparison experiments are carried out in the collection of 670 thousand samples of spam text data sets.
Keywords/Search Tags:spam, text classification, word embedding, deep learning, ensemble learning
PDF Full Text Request
Related items