Font Size: a A A

Research On Spam Filtering System Based On Maximum Entropy Model

Posted on:2007-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:G T SiFull Text:PDF
GTID:2178360185978163Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
E-mail is becoming a major means of communication in our daily work and life, but the flooding of spam has made a great impact on its application, therefore it is a very important task to distinguish spam from legitimate mails.Many researches on spam filtering have been carried out in recent years. Some measures of spam filtering, such as black or white list, manual rules and so on have been widely used. But it is difficult to maintain the rules because the features of spam are alterative, and the rules-based filtering is not satisfactory in practice. So these measures have certain limitations. Nowadays, with the development of machine learning, text categorization and information filtering, the analysis of e-mail content is becoming a hot research topic in spam filtering. The content-based filtering method can obtain the features of spam automatically, so it is a good approach on filtering technique in accuracy. Some backgrounds of the research are presented in this thesis, and then some analyses and comparisons are made among the commonly used content-based spam filtering techniques. Subsequently, some pre-processing of e-mail is performed, and a XML denotation of e-mail is introduced to provide a uniform structure for the development of e-mail oriented applications.Maximum entropy model is a mature statistical model whose model of computation is independent of special tasks, and it takes on simpleness, universality and portability. Therefore, maximum entropy model has been widely used in natural language processing in recent years.The main task of this thesis is to apply maximum entropy model to spam filtering, and the primary framework of spam filtering system based on maximum entropy is proposed. Considering the semi-structure characteristic of e-mail, features are extracted from e-mail...
Keywords/Search Tags:spam filtering, pre-processing, maximum entropy, feature extraction, smoothing techniques, outlook add-in
PDF Full Text Request
Related items