Spam Text Classification Method Based On Deep Learning

Posted on:2019-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Li

Full Text:PDF

GTID:2348330545491868

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of e-mail,e-mail has become the carrier of commercial advertisements,malware and illegal files.The amount of spam that people receive on average is far more than that of normal mail,which seriously affects people's life and network security.How to accurately identify the spam has become an urgent problem to be solved.At present,the commonly used spam recognition methods generally include two types,which are mail based source recognition technology and mail based content recognition technology,such as whitelist and blacklist mechanism,keyword matching and naive Bayes text recognition model.With the increasing number and style of e-mail,the key words of spam are also changing dramatically.All rules based recognition methods need to update the feature library regularly,which will cost a lot of manpower.The content based recognition method has been effective,but the performance of the traditional naive Bayesian model in text classification is lagging behind the deep learning model.This paper is based on the use of deep learning of text content to classify spam recognition.Specific research work and contributions include:(1)The application of deep learning model in text classification is analyzed,and a deep learning model(Conv-BiGRU Model)based on convolution neural network and recurrent neural network is proposed.This model,which can extract local features,can also extract the features of the front and back words,and combines the advantages of the convolution neural network and the recurrent neural network.Experiments show that the new model improves the correct rate of spam classification;(2)The Stacking model based on deep learning is improved.The probability output of the first layer model is changed to the output of the final full connection layer of the deep learning model,and the feature expression of the second layer model is increased.And the spam text classification system has been completed.Compared with multiple models,the Stacking model improves classification performance.(3)In this paper,the Conv-BiGRU model and the Stacking model based on deep learning has been completed.According to different model methods,embedding layer input,model improvement and deep learning model parameters,a number of comparison experiments are carried out in the collection of 670 thousand samples of spam text data sets.

Keywords/Search Tags:

spam, text classification, word embedding, deep learning, ensemble learning

PDF Full Text Request

Related items

1	Research On Text Sentiment Analysis Based On Improved Dictionary And Ensemble Learning
2	Combining Topic Model And Word Embedding For Short-Text Classification
3	Research And Implementation Of News Text Classification System Based On Deep Learning
4	Research On Multilingual Short Text Classification Method Based On Deep Learning
5	Research On Chinese Text Classification Based On Deep Learning
6	Spam Messages Based On Integrated Learning Multiple Classification Study
7	A Study Of Deep Learning-based Methods Of SMS(short Message Service) Spam Detection
8	Research On Short Text Classification Based On Deep Learning
9	Design And Implementation Of Long Text Classification Algorithm Based On Deep Neural Network
10	Deep Contextual Word Embedding In Natural Language Processing