Font Size: a A A

Study And Application On Chinese-Spam Filtering Technology

Posted on:2010-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiFull Text:PDF
GTID:2178360278967630Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet Application around the world, E-mail has become one of the most economical and fastest ways in daily communication. However, some people excessively send the Spam for their own personal aims, and this not only waste a lot of network resources, but also do much harm to the whole society. Among them, the Chinese Spam takes up a large proportion.With the continuous development of Spam Filtering Technology, the Spam filtering technology now has made a rapid progress under the pure English condition, but these filtering methods still maintain low efficiency and can not meet the customers' needs.On the basis of the analysis of the E-mail system working principle, the author thoroughly studies the text classification, mail encoding and decoding, the Chinese word segmentation and the feature selection and so on. Moreover, the author used the DFR-method in the process of feature selection, and achieved a better experimental effect. This article mainly studied some common-used filtering arithmetic, and analyzed their problems existed in the Chinese condition, finally proposed a comprehensive method of the Chinese-Spam filtering after analyzing the advantages and disadvantages of each arithmetic. First of all, through the white list filtering, the method is helpful to receive the normal E-mail in white list. Secondly, through the second filtering method based on these principles, it helps to ensure that the error is zero; at last, through the third filtering method based on the statistics, it tries to increase the recall, then the author designed the Chinese-Spam filtering system in the Linux system and finally realized it.
Keywords/Search Tags:Spam Filtering, Chinese Word Segmentation, Feature Selection, Bayes Classification, SpamAssassin
PDF Full Text Request
Related items