Font Size: a A A

Research And Implementation Of Content-Based Spam Filter Technology

Posted on:2008-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2178360215958148Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, E-mail has become a primary means in modern telecommunication. However, spams (also named as "junk mails") , simultaneously pervade widespread on line, bringing a lot of troubles to numerous users. Therefore, it is important and practical to prevent and control spams effectively.The thesis, on the one hand, investigates thoroughly considerable anti-spam documents and data from both home and abroad. Furthermore, analysis and conclusion are made on existing anti-spam techniques. The E-mail filter technology is an important measure against spams, which at present is mainly based on IP address, rules and the content respectively.The focus of this dissertation is on the E-mail filter technology based on E-mail contents. It is a technology to filter E-mail through analyzing the contents of E-mail. Actually, it is a matter of text categorization, i.e. to preprocess the text content of mail and then recognize spams over text categorization. In this thesis, the methods of pretreatment and text categorization are studied deeply. As for pretreatment, various Chinese words splitter technologies are studied and compared, so are varied methods in selecting and extracting feature. In the thesis, as far as the text categorization is concerned, many sorts of methods are studied and compared. Among those methods, Bayesian algorithm is a high priority, whose ways and principle are investigated deeply.A processing system of spams based on the content of E-mail is designed and realized. The use of Forward Maximum Matching Method helps to realize the Chinese words splitter; the use of Odds Ratio makes the feature selection and extraction come into realization; the use of Bayesian algorithm realizes the classification of E-mail by content in this thesis. It has been shown by the result that the utilization of Bayesian algorithm into spam processing will be one of effective ways to realize the characteristic filtering on spams.
Keywords/Search Tags:Spam Filtering, Chinese Words Splitter, Feature Selection and Extraction, Bayesian Categorization Algorithm
PDF Full Text Request
Related items