Font Size: a A A

Research Of Junk SMS Filter System Based On Hadoop Platform

Posted on:2017-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Y SongFull Text:PDF
GTID:2348330536476696Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years,with the growth of mobile phone users,SMS became one of the main tools of everyday communication as its main business,at the same time,a large number of spam messages also seriously affects the mobile phone user's daily life and property safety.How to classify the messages quickly and accurately has been a big problem to solve.I found some deficiencies when i study of spam filtering techniques,The filtering algorithm based on keywords has simple operation but the filtering precision is too low,The filtering algorithm based on content has good filtering effect but the implementation process is complex.The research works as follows:1.This article selects the Hadoop platform as a spam filter for its high reliability,high scalability high fault tolerance.2.This article found that the Bayesian algorithm has short classification time throngh analyzing and comparing the present SMS spam filter algorithms,so using the Bayesian algorithm to be the main algorithm.3.In the pretreatment stage of SMS spam filter system,The traditional TFIDF function parameters are fixed for reduce the dimensions of feature vector;In the classification stage of SMS spam filter system,this article will put the unknown messages onto one sort which has less risky decision-making.4.Finally,put the improve platform applied to Hadoop platform,and using the MapReduce model for programming,using the HBase model for storage.We found some changes through research the improve SMS spam filter system.First,The SMS spam filter system based on the Hadoop platform's Speed-up Radio increase 0.227 than it based on single PC.Second,Spam filter system based on the improved bayesian algorithm and fusion TFIDF algorithm,the accuracy improved by about 30%,the precision rate and recall rate as well as the more obvious increase.
Keywords/Search Tags:Spam SMS filtering, Hadoop platform, Bayesian algorithms, decision-making factor
PDF Full Text Request
Related items