Font Size: a A A

Research On Spam Short Message Filtering Algorithms Based On Incremental Multi-model Fusion

Posted on:2020-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2428330602458741Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of communication technology,the usage scenarios related to communication get more extensive.On the one hand,people have been enjoying the rich achievements brought by science and technology,the wireless network technology has been developing rapidly.On the other hand,due to the lack of system and supervision,many"black industries" surrounding wireless communication has caused a lot of negative effects,for example,spam messages have been always bothering people's lives.In order to detect,recognize and filter spam messages,the current spam message filtering based on text categorization technology is studied.Firstly,the pre-processing technology and feature extraction technology in text categorization process are introduced in detail.Then,the performance of K-proximity algorithm and Naive Bayesian algorithm are compared and analyzed through experiments.Finally,two improvements are proposed because the traditional text categorization algorithms have shortcomings,that is to say,when new samples are added,the classifier will degrade the classification result because of the limitation of recognition.An incremental multi-model fusion method based on scoring method is designed and implemented.Incremental multi-model fusion method trains the newly added samples sequentially.Each sample data set can be trained to get a sub-classifier and multiple sub-classifiers can be obtained from different training sets.According to the principle of"minority obeys majority",each text message to be sorted is classified by incremental multi-model fusion method.If the output of most sub-classifiers is normal,the text message is normal,otherwise,it is spam message.The advantage of this method avoids retraining the classifier and reduces the loss of time and the waste of resources,and improves the effect of text classification.An incremental multi-model fusion method based on learning method is designed and implemented.This method is also used to train new samples.Unlike the method mentioned above,the incremental multi-model fusion method based on learning method uses a two-tier framework,in other words,there are two-tier classifiers.The primary classifier uses Naive Bayesian algorithm to classify,the output of which is the input of the secondary classifier,and the SVM algorithm is adopted in the secondary classifier.This two-tier structure solves the incremental problem and further improves the filtering results of spam messages.
Keywords/Search Tags:Text classification, Message Filtering, Naive Bayes, SVM, Multi-model fusion
PDF Full Text Request
Related items