Font Size: a A A

Research On The Method Of Chinese Email Filtering Based On SVM

Posted on:2009-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y HouFull Text:PDF
GTID:2178360272463512Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the wide spread of intemet,Email has gradually developed as one of the most significant corresponding ways for people's work and daily life due to its rapidness and convenience.However,the spam problem becomes increasingly serious.Spam not only spread harmful information,but also fill up mail server storage space,and harm the legal benefits of individual and enterprise.Although,there has existed a lot of methods to filter spam,the current status that numbers of spam are not be reduced but increased has demonstrated that these methods do not have the perfect filtering effectiveness.Therefore,anti-spam problem becomes an international, significant and practical topic presently.The methods of Email filtering are gradually inclined to machine discrimination method based on content.There are two major kinds of methods on automatic Email filtering:based on rule and based on probability. These methods are not only very easy to carry out,but also have good filtering effectiveness.However,they can not harmonize the relation of filtering speed and filtering precision.Support vector machine(SVM) is a learning machine has addressed much more attentions in recent years.Based on statistical learning theory, SVM is not only used in many fields such as voice processing,graphic searching,text classification,but also can avoid "dimension curse". Therefore,it has been well accepted and became an effective machine learning method.This thesis focused on researching a Email filtering method based on SVM,and the main works contains:(1)Constructing a Email filtering model based on large-scaled real datasets.According to construct feature dictionary dynamically,this model can not only continuously enrich feature dictionary effectively,but also avoid a lot of problems by the improper feature dictionary.(2)Processing Email by VSM.In order to attract the feature of Email, this thesis segments Chinese Email by the approach which combines forward and backward segment approach.The proposed approach can improve the selection of feature and the representation of feature weight.(3)Optimizing the SVM filtering model by Fisher Linear Discrimination, and constructing the SVM optimal model based on Gaussian kernel and polynomial kernel.(4)Validating the Email filtering model on CCERT Email datasets.By the comparisons of filtering effectiveness between SVM and other methods, experiment results demonstrate that the SVM Email filtering model can greatly improve the effectiveness of Email filtering,and false alarm rate keeps at 1%,correct rejection rate achieves 98.5%.Some commonly used filtering factors of the presented approach can beyond those published by NetEase Free Email(98%).This thesis applies generally valid method of SVM and combines the technology of Chinese information processing obtain ideal results in Chinese Email filtering.Besides,the obtained results in the thesis not only promote the theory researches on Email filtering but also have practical application value.
Keywords/Search Tags:Support Vector Machine, Chinese Email, Filter, Model Select, Dynamic Feature Dictionary
PDF Full Text Request
Related items