A Multi-level Framework To Filtering Spam Messages Based On Text Content

Posted on:2017-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:J Mi

Full Text:PDF

GTID:2308330503458928

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the constantly updated form of short message and text feature, it is very urgent to filter spam messages accurately and fast. Nowadays, the existing spam SMS filtering methods mainly include setting black and white list,matching key words,reporting actively by users and filtering based on content, etc. Among them, spam SMS filtering based on content can more effectively respond to the diversity of the constant updated message form, and does not have to rely on other kind of information of SMS. But for text content, traditional filter algorithms ignored the obvious text characteristics of spam message which influences the filter’s performance. Besides, these methods have no good solution to the problem of sparse vector caused by short-content.In this paper, we proposed a new framework for building classifiers that deal with filtering out spam messages based on text. This new framework makes great use of noise information which may contributes greatly before pre-processing. It abstracts this part of noise information as custom properties and then use them as the first feature set to filter typical spam messages. After that, it predict training set with LDA topic model, find the distribution between topic and text and the distribution between topic and word, then it can find more synonyms for original key words. By this, this framework can extend features effectively and reduce the negative effect of the sparse vector on the classification results.In the end, this paper describes the experimental sections. The data sets we used are real messages from public which can represent the varying proportion of spam and legal messages that users received. We did a careful experimental procedure to evaluate the effect of this new spam filter in three aspects, ‘spam’,’legal’ and ‘weighted’ respectively so as to analyze the result from different angles. Meanwhile we investigated the effect of training-corpus size, sub-classifiers number, feature set size on the filter’s performance. The results proved that this filtering framework can effectively improve the accuracy of filtering spam messages based on text content.

Keywords/Search Tags:

spam message filtering, text classification, Feature extension, Classification algorithm

PDF Full Text Request

Related items

1	Spam Message Filtering System Based On MCNN And BiLSTM
2	Research On Shielding Mechanism Of Short Message Spam And It's Application
3	The Research Of Chinese Spam Filtering Technology Based On Na(?)ve Bayes Classification Algorithm
4	Application Of Bayesian Classification In Spam SMS Filtering
5	Design And Implementation Of The Mobile Spam Filtering Software Based On Content
6	The Design And Implementation Of The Spam Message Interception System On Android Platform
7	Design And Implementation Of DTFS Algorithm For Spam Filtering Of University OA System
8	Filter Spam Messages Based On Text Classification Algorithm
9	Research On Key Techniques And Applications In Text Classification
10	Spam Messages Based On Integrated Learning Multiple Classification Study