Font Size: a A A

Spam Filtering Method Based On Improved Convolutional Neural Network

Posted on:2022-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:D SongFull Text:PDF
GTID:2518306608976289Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the popularity of e-mail,while facilitating our communication,a large number of spam has also flooded our daily lives.Spam has brought us a bad user experience and the spread of bad information,So the filtering of spam is particularly important.Deep learning has achieved good results in text classification,and it can be applied to spam filtering.As a typical algorithm of deep learning,convolutional neural network has great application value in the field of text classification.Although the convolutional neural network has natural structural advantages in text feature extraction,it also has its limitations.In order to improve the classification accuracy of the model,this dissertation uses different methods to improve the convolutional neural network.First of all,preparatory work for spam filtering was carried out,including data word segmentation technology,stop word processing,etc,as well as the representation method of text features.In addition,several machine learning models were compared,and their pros and cons were analyzed to provide a basis for subsequent text feature extraction and experimental design.Aiming at the deficiencies of convolutional neural networks in spam filtering,this article mainly does the following work:Aiming at the problems of independence between email data and high data dimensions,the inception structure multi-size multi-channel parallel convolution operation is used to integrate the inception structure into the CNN model,and an inception-CNN network structure is proposed.Optimized the design of the convolution kernel in the inception V1 unit,decomposing the larger convolution kernel into continuous asymmetric convolution kernels,reducing the amount of calculation,and then designing a 4*4 convolution kernel and maximum pooling layer,And finally calculate the data category through the softmax classifier.Through the verification of two email data sets,the inception-CNN model is compared with the CNN model and several machine learning classification models.The classification evaluation index is due to other models,indicating that the model has a better classification effect.Aiming at the local feature extraction of the traditional CNN model,lack of perception of the overall semantic meaning of the sentence,the use of the LSTM model to learn the long-term dependence in the sentence,and the attention mechanism to give different weights to the words,this paper proposes a LSTM-Attention-CNN Hybrid model.The model first uses LSTM to extract text context information,uses a time series output vector as a feature vector,and then enters the Attention layer to calculate the weight of each word,and then enters the convolutional layer feature extraction,the pooling layer compresses the dimensionality,and the softmax classifier Calculate the output category.Compared with several existing models,the model proposed in this dissertation has achieved better results in the mail classification task,and the results prove that the integration of LSTM model and attention mechanism can improve the classification results of the model.Figure 21 table 9 reference 69...
Keywords/Search Tags:Deep learning, CNN, Inception structure, Attention mechanism, LSTM
PDF Full Text Request
Related items