Font Size: a A A

Research And Implementation Of Spam Filtering System Based On Behavior Recognition

Posted on:2020-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y WenFull Text:PDF
GTID:2428330578477256Subject:Engineering
Abstract/Summary:PDF Full Text Request
E-mail has become one of the most popular communication applications because of its convenient and fast delivery of information.Although e-mail can provide great convenience to network users,it also brings some very serious hidden dangers,that is,there may be some spam.Spam is usually a large number,which will crowd out the bandwidth resources of the network to a large extent,which makes the network communication channel congested,so that many network users can't connect to the destination network or can't browse and edit some important regular mail.This will greatly consume the user's time and effort,and will also make the application of network resources unreasonable,which will seriously damage the normal order of the Internet and network security.Therefore,how to remove a large amount of spam on the network and remove the efficiency of spam has become an urgent problem for network users and mail providers.Currently,related technologies for removing spam and related research are among the application fields of the Internet.However,analyzing the existing filtering technology,there are still some problems in spam filtering.For example,the existing filtering methods are not accurate,and the cases of frequent misjudgment and the technology with high accuracy of spam filtering are time consuming.Multiple and user information leaks and so on.In order to further speed up the judgment and improve the accuracy of spam judgment,this paper not only applies the characteristics of the mail header,but also combines the characteristics of random forest to apply the random forest algorithm in mail filtering.This not only improves the accuracy of determining the attributes of the mail,but also improves the efficiency of the decision.The research and implementation of the behavior-based spam filtering system includes the following;1.Use the F-score method to find out the important behaviors that are judged to be spam.According to the characteristics of a large number of spam analysis,the common behavior characteristics of spam are presented.The principal component analysis method is used to select specific representative behavior characteristics.Finally,the random forest algorithm uses the selected optimal behavior characteristics to determine the likelihood of spam.2.Construct a random forest algorithm spam filtering model based on behavior recognition.The research of this topic demonstrates a variety of spam filtering methods.After fully analyzing and comparing these methods,this paper finally determines the application of random forest method based on behavioral features to spam filtering.Based on some mature experimental environments and data information applied by the current academic community,a random forest spam filtering model based on behavior recognition was built.This paper studies the basic structure of random forests,algorithm implementation steps,and the process of training data.3.System design and implementation.Based on the existing spam filtering model,combined with the random forest model designed above,the system realizes the analysis and positioning of requirements and functions,and realizes the research and construction of the overall framework of the spam system.Implemented systems that demonstrate a viable approach to fast and accurate filtering of spam.
Keywords/Search Tags:Spam, Random forest, feature selection, mail filtering
PDF Full Text Request
Related items