Font Size: a A A

Research On Massive Spam Message Filtering Technology In Distributed Architecture

Posted on:2018-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:C D ZhaoFull Text:PDF
GTID:2348330515957444Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays,with the increase in the mobile phone penetration,combined with social networking sites,shopping sites,universal mobile phone registration,mobile phone users,businesses have become widely used marketing tools.But the follow-up spam messages,not only take up network resources,interfere with the user's normal life,but also to the traditional spam message filtering has brought great challenges.Massive text filtering text requires a lot of storage space and more computing power,how to effectively filter large amounts of data is an urgent problem to be solved.With the emergency of cloud computing Hadoop distributed platform,a new distributed parallel programming model is proposed,which provides a new idea for spam message filtering.The essence of spam filter is that SMS text will be classificated into spam message and ham messages).In this paper,we first further study the classification preprocess of SMS text,text feature selection,text categorization algorithm and related technologies.Through the comparative study of current spam filtering algorithms,choose the naive bayesian spam filtering algorithm,combining the Hadoop distributed platforms in the core of mass data processing technology,a naive Bayesian spam short message filtering method based on MapReduce is proposed.The research work in this paper shown in the following aspects:Firstly,in the feature extraction stage,the feature selection algorithm is improved by combining information gain and CHI that which is reduce the feature vector space dimension,optimize the computing time and storage space.Secondly,the paper improves the naive Bayesian spam filtering algorithm,introduces the threshold value in the decision making stage,and reduces the probability that the legal short message will be judged as the spam message,and improves the classification accuracy.Thirdly,in view of the efficiency of SMS filtering,a solution based on Hadoop distributed framework is proposed.This paper uses the parallel computing model MapReduce to deal with text preprocessing,short message character word selection,short message text classification training and testing,and has obvious advantages in handling large amount of short message.Finally,the experimental results show that the method of mass spam filtering based on MapReduce model in the distributed architecture proposed in this paper improves the recall efficiency,precision and efficiency of SMS filtering,and the efficiency of filtering increases with the cluster size.
Keywords/Search Tags:Spam SMS, SMS Filter, Naive Bayesian, Hadoop, MapReduce
PDF Full Text Request
Related items