Font Size: a A A

The Application Of Text Classification In Spam Interception System

Posted on:2015-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2348330542952504Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the progress and development of Internet technology,Internet applications also gradually become an important source of people communicates with each other and gets information.E-mail technology originated in the 1970s,there is no doubt that it is now one of the most important tools to communicate information.Its main feature is real-time,convenient operation and low cost.But it also produced a series of problems.There will be a lot of spam in your emails.In 2014,E-mail users average weekly received spam email up to 38.2%,according to a report released in July from The Internet Society of China.The emergence of spam greatly interferes with the normal information communication of people,even brought unpredictable economic losses to people.Therefore,it is imperative to anti-spam technology research.This paper mainly studies the application of email spam interception system based on the text classification.Article first introduced the past spam blocking technology:black and white list technology,such as rule-based spam blocking technology.Then draw out this research based on the content of the spam mail intercept technology.Content-based spam blocking technology mainly applied text categorization algorithm as the main technical means of system.Content-based spam blocking technology is regard machine learning algorithm as the core technology,use a variety of machine learning algorithm to sort the mail,eligible mail will be deemed to be legitimate,the rest will be in spam mail processing,so as to achieve the goal of intercepting spam.It is such implementation of specific methods,first select a specific machine learning algorithms,and then using the processing method of this algorithm to classify email.This paper applied the classification algorithm is relatively mature,efficiency and classification result prominent Bayes classification algorithm.Paper first part introduced the E-mail related technologies and anti-spam technology,text classification of related knowledge,vector space model(VSM),the basic knowledge such as the process of automatic text classification.Then this paper studies the application in intercepting spam text categorization system,the core part of the system for the design,mainly involved:for pretreatment module,training module and classification module are analyzed in detailed design and the application of key technologies were studied.Finally,the various modules of the system implementation,and performance test experiment were carried out on the effects of system.Among them,grasp in practice can't take the principle of normal mistaken for spam mail,mainly on the judgment of the value ? of parameters are discussed,and the experimental system requirements of the optimal value ? was obtained.
Keywords/Search Tags:Spam, Text classification, Feature selection, Chinese word segmentation
PDF Full Text Request
Related items