Research On Spam Short Message Filtering Algorithms Based On Incremental Multi-model Fusion

Posted on:2020-12-22

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Wang

Full Text:PDF

GTID:2428330602458741

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the continuous development of communication technology,the usage scenarios related to communication get more extensive.On the one hand,people have been enjoying the rich achievements brought by science and technology,the wireless network technology has been developing rapidly.On the other hand,due to the lack of system and supervision,many"black industries" surrounding wireless communication has caused a lot of negative effects,for example,spam messages have been always bothering people's lives.In order to detect,recognize and filter spam messages,the current spam message filtering based on text categorization technology is studied.Firstly,the pre-processing technology and feature extraction technology in text categorization process are introduced in detail.Then,the performance of K-proximity algorithm and Naive Bayesian algorithm are compared and analyzed through experiments.Finally,two improvements are proposed because the traditional text categorization algorithms have shortcomings,that is to say,when new samples are added,the classifier will degrade the classification result because of the limitation of recognition.An incremental multi-model fusion method based on scoring method is designed and implemented.Incremental multi-model fusion method trains the newly added samples sequentially.Each sample data set can be trained to get a sub-classifier and multiple sub-classifiers can be obtained from different training sets.According to the principle of"minority obeys majority",each text message to be sorted is classified by incremental multi-model fusion method.If the output of most sub-classifiers is normal,the text message is normal,otherwise,it is spam message.The advantage of this method avoids retraining the classifier and reduces the loss of time and the waste of resources,and improves the effect of text classification.An incremental multi-model fusion method based on learning method is designed and implemented.This method is also used to train new samples.Unlike the method mentioned above,the incremental multi-model fusion method based on learning method uses a two-tier framework,in other words,there are two-tier classifiers.The primary classifier uses Naive Bayesian algorithm to classify,the output of which is the input of the secondary classifier,and the SVM algorithm is adopted in the secondary classifier.This two-tier structure solves the incremental problem and further improves the filtering results of spam messages.

Keywords/Search Tags:

Text classification, Message Filtering, Naive Bayes, SVM, Multi-model fusion

PDF Full Text Request

Related items

1	Design And Implementation Of Text Classification System Based On K-neighborhood And Naive Bayesian
2	Research On Shielding Mechanism Of Short Message Spam And It's Application
3	Research In Filtering Of Short Message Service Based On Content Mining
4	Design And Implementation Of Short Message Classification System Based On Naive Bayesian
5	Research On Text Classification Algorithm Based On Naive Bayes Method
6	Text Categorization Based On Naive Bayes Method
7	The Study Of Chinese Text Categorization Based On Na(?)ve Bayes
8	A Text Classifier About High Blood Pressure Based On Naive Bayes
9	Research On The Methods Of Chinese Text Classification Using Bayes And Language Model
10	The Research And Application Of Text Categorization Arithmetic In Spam Filtering