Font Size: a A A

Research And Implementation Of Spam Filtering System Based On Improved Naive Bayes Algorithm

Posted on:2018-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:C L CaoFull Text:PDF
GTID:2348330542472265Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of network,the use of e-mail communication is a very common way of modern life,its wide application has brought great convenience to people,but the hidden economic value can be used together,the resulting spam and develops rapidly,so far,the huge number of spam has caused significant economic and security impact to individuals,groups,countries,society.In spam filtering filed,Naive Bayes algorithm is one of the most popular algorithm,this paper proposes a modified using support vector machine(SVM)of the Native Bayes algorithm: SVM-NB.First,SVM constructs an optimal separating hyperplane for training set in the sample space at the junction two types of collection,then according to its similarities and differences between the neighboring class mark for each sample to reduce the sample space also increase the independence of classes of each samples,finally using Naive Bayesian classification algorithm for mails.The main research contents of this paper are as follows?(1)Researching on the technology of spam pretreatment,including Chinese word segmentation,text representation,feature extraction,feature selection and so on.This paper focuses on the support vector machine classification algorithm and the Bayes classification algorithm,and analyzes their deficiencies in text classification.(2)The new classification model is applied to spam filtering technology,focusing on the Naive Bayesian classification model,SVM classification model and in-depth study of content classification algorithm based on the idea of combining ICTCLAS Chinese segmentation algorithm according to the relative strength of the feature,sort of feature information gain method,and then get the hyperplane according to the ordered queue constructed by the SVM classification,and pruning the redundant vector,finally uses the NB algorithm to classify text classification.(3)Designing and developing a spam filtering system.The system uses a variety of filtering methods combined,form a solid defense,the use of black / white list filtering technology in the first filter in order to quickly filter the mail in the early part of the mail,can not be identified in two filters,filter second layer application new intelligent filtering method,the system can timely response to user demand.The system not only has the spam filtering function,but also can realize the assistant function of the two-way filtering,the gateway monitoring,the short message warning,the junk mail origin destination channel display and so on.Finally,through the simulation experiment,the performance of the new mail filtering models is tested and the function of each module is tested.The simulation results show that the algorithm reduces the sample space complexity,fast to get the optimal classification feature subset,effectively improve the classification speed and accuracy of spam filtering.
Keywords/Search Tags:Naive bayes, SVM, Trim strategy, TSVM-NB
PDF Full Text Request
Related items