Research And Design Of Multiple Mail Filtering System Based On BP Neural Network

Posted on:2019-04-02

Degree:Master

Type:Thesis

Country:China

Candidate:Z K Wang

Full Text:PDF

GTID:2428330590478652

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,the frequency of using e-mail between people is increasing day by day,and it has gradually become an important communication medium.However,with the continuous popularization of e-mail,the proliferation of spam is accompanied by poor control and even affecting people's normal work and life.There are still many shortcomings in the existing spam filtering,and the spam filtering cannot be well filtered.In order to deal with this shortcoming,the research of strengthening spam filtering technology is particularly important.This study try to design a spam filtering system model on statistical-based method.Model training uses BP neural network learning algorithm.In the experimental process,the data preprocessing and algorithm training of the public PU corpus are used to obtain a large number of models,and then the model selection is carried out.Finally,the main and multiple collaborative models of the spam filtering system are obtained through the combination of models.In the filtering process of the model,the mail is divided into multiple data streams into the FC layer,and the results are output in the Output layer respectively,and then the weight is calculated according to the sub-model false reporting rate(FALLOUT)to obtain the final judgment result.The theory preprocessing process includes word frequency statistics based on Hadoop,dictionary dimensionality reduction based on improved TF algorithm and vector matrix generation of vector space model(VSM).The word frequency statistics obtain a feature word's list of the total mails,a feature word's list of ham,a feature word's list of the spam,and a feature word's list of each mail.In this study,the traditional TF algorithm is improved for data preprocessing.The word statistic is used to reduce the dimension of the corpus feature word set.The dimension is controled within 2000 dimensions,and the better experimental results are obtained.The generation of VSM form sparse matrix is realized by JAVA programming.The selection of the main and auxiliary models is divided into three subgroups of A,B and C by data partitioning.The subgroups are used to design the programs for training,including A+B_C,A+C_B and A_B+C.Finally,we obtain the main and auxiliary model by caculating the model simulation of the average accuracy.Model selection is a key part of this research.The experiment compares the models through different matching schemes,compares the optimal single model with the SVM algorithm training model,and compares the optimal single model with the system combination model to verify the performance of the system model step by step.At the end of the experiment,the performance of the system model was further tested and evaluated by calculating the recall rate,correct rate,F value,accuracy,AUC(Area Under Curve)value,model calculation based on MACCs and FLOPS,and memory occupancy.The final conclusion of the experiment is that the odd-numbered optimal models are combined into one classifier.Through multiple filtering,the judgment accuracy and system generalization ability can be improved,and the false positive judgment of legitimate mail can be effectively reduced.

Keywords/Search Tags:

Mail Filtering, VSM Formal Matrix, Primary And Secondary Multi-Filter Model, AUC Value, Performance Evaluation

PDF Full Text Request

Related items

1	Achievement And Design Of Teaching Quality Evaluation System Of Primary And Middle Schools Based On JavaEE And Fuzzy Theory
2	Research And Optimization Of High Performance Mail Security Gateway
3	The Design And Development Of Primary And Secondary Schools Teacherâ€™s Performance Management
4	Investigation And Analysis Of Education Performance Of Primary And Secondary School Website
5	Design And Implementation Of Examination Analysis System For Primary And Secondary School Based On B/S
6	Design And Implementation Of Primary And Secondary Education Resource Prototype System Based On Intelligent Personalized Recommendation
7	Design And Implementation Of Teacher Professional Development Evaluation System
8	Research On E-mail Personalized Filter System
9	Email Security, Filtering And Inspection Techniques Studied
10	The Design And Implementation Of A Mobile Phone Management System Of Mail With Filtering