Font Size: a A A

A Research Of Spam Filtering Based On Text Mining

Posted on:2011-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z XuFull Text:PDF
GTID:2178360308465542Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, Email has become one of the most popular Internet applications as an efficient and economy modern communication tool. Email has brought us great convenience in our daily life, but at the same time, it also has brought us an annoying byproduct——Spam. Massive Email is useless even harmful in network dissemination, not only consumed a great deal of network resource, moreover seriously threatens the user's information security. Therefore, it is a difficult problem to facing for all over the word.In the real world, the knowledge exits not only in the form of structural data,but more exits in the form of unstructured or semi-structured data. In this situation, the technology of text data mining emerges as the times require. It helps us avoiding annoy from spam by applying the text data mining to the spam filtration. So it has great practical meaning for us.This paper first introduced the history and the definition of spam, pointed out its hazard, discussed the measures we adopt for solving the problems, and further introduced the protocol of SMTP, POP3, IMAP and MIME. In Chapter 3, the paper expatiated on the most-used spam filtering techniques, including on the basis of the role differentiation, content, access, behavior and so on. In addition, the paper also introduced the emerging filtering techniques, such as predictive sender profiling and IP reputation.The contributions of this thesis are listed as follow:1. This paper summarizes the popular methods of spam filtering. In order to avoid filtering, the spammers use new tricks which caused many simple filters filtering spam ineffectively. Accordingly, must seize the main characters of present spam and carry out pertinence filtering.2. This paper presents a new spam filtering method based on discriminative model. On the problem of feature selection, the innovation is introduced the definition of mutual information difference; Meanwhile, in the process of mail classification, this paper adopts gradient descent algorithm for the update of feature weight. Finally, the model of spam filtering was building. The experiments demonstrate that the spam filtering method based on discriminative model has achieved good results. 3. This paper presents an extracting method based on features of text region. Because the color images are easily interfered with edge, the paper adopts Color Roberts operator and morphological method and designs a feature extracting strategy of text region. Experiments prove this method achieved good results for image spam filtering. Meanwhile, AHP was introduced into the classification algorithm. This provides a new thought for solving decision problem.
Keywords/Search Tags:Spam filtering, discriminative model, AHP, feature selection
PDF Full Text Request
Related items