Research On Spam Text Filtering Based On Deep Learning

Posted on:2020-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:X Sun

Full Text:PDF

GTID:2428330578979397

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of big data,cloud computing and IoT(Internet of Things)technologys,various applications on the Internet are presented with complexity and diverse.Large amount of spam messages not only occupies a lagre amount of computing and communication resources,but also have a bad impact on human life.Spam text is one of the most important components of spam messages.We focus on the spam text filtering algorithms,related technologies and then propose novel text filtering algorithms.The main works and contributions of this paper are as follows.(1)As for the disadvantage of using the Recurrent Neural Network(RNN)on sentences clasification which can not extract keyword features,we propose a novel algorithm called TC-LSTM which combines convolutional neural networks with LSTM for spam text filtering.TC-LSTM works well on the spam text with obvious keyword features because of the structure of CNN.At the same time,due to using LSTM,TC-LSTM is effective on the sentences which include no significant keywords.Experiments show that TC-LSTM outperforms CNN and LSTM on spam text filtering.Experiments on different datasets show that the proposed method is more effective than other typical methods.(2)We study the influence of using Word Embedding in different ways in this paper,which is tested and verified on spam text datasets.We use three different Word Embedding methods,which are pre-trained word vectors and we fix it when model is training;pre-trained word vectors and fine tuned;randomly initialized word vectors which are jointly trained in the model.We experiment on different spam text datasets and analyze the results to further improve the performance of TC-LSTM.(3)We propose a new algorithm which is called TC-LSTM-TFIDF to improve TC-LSTM.This algorithm combines TFIDF and assigns different weights to each word,which improves the performence of TC-LSTM.Because our algorithm considers the influence of each word to the classification label,it works better in extracting features than previous work.Experiments show that the proposed mothod can markedly improve TC-LSTM and outperforms other typical methods.

Keywords/Search Tags:

Spam Text Filtering, Deep Learning, Model Combination, Text Classification, Natural Language Processing

PDF Full Text Request

Related items

1	Intelligent Device Text Classification Method Based On Natural Language Processing
2	Research On Text Classification Based On Deep Neural Network
3	Research On Internet Spam Identification Method
4	Research And Implementation Of Short Text Classification Model Based On Course Knowledge Points
5	Text Filtering Key Technologies
6	Research On Financial Text Classification Method Based On Deep Learning
7	Research On Deep Learning Methods For Text Classification Tasks
8	Research And Analysis Of Text Classification Theory Based On Deep Learning
9	Research On Text Classification Based On Natural Language Processing And Machine Learning
10	Research And Application Of Text Classification Based On Natural Language Processing