Font Size: a A A

The Filtering Technique Of Junk-emails Based On Text Mining

Posted on:2007-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:X M WangFull Text:PDF
GTID:2178360182482235Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, Email becomes a primary means in communication because of its double-quick and convenient characteristic;but to our surprise, everyday we may find many junk-emails in our mailbox . This paper presents a classified and filtering method which based on text mining , it can not only support the ability of automatically filter spam, but also apply to E-Government E-Business, for E-Government and E-Business, Email is used frequently. This method can be used for automatic classification and transmission of Email, so it can reduce the workload of system transmission of Email.This system includes these modules ."collection and pretreatment of Email Chinese Words splitter extract the character the classification and filtering of Email . The primary function and arithmetic with java source code are discussed in this paper.This paper is composed of seven chapters as following.Chapter 1: The particular features of classification of Email is discussed in this chapter, and the main researching task is proposed later.Chapter 2: The whole design of automatic classification and transmission of Email is described in this chapter. It includes the primary modules and the function of each module.Chapter 3: This chapter presents the collection and pretreatment of Email, especially introduces JavaMail API. The most important technology as HTML parser is also discussed.Chapter 4: Chinese text splitter is described in this chapter. Based on analysis of all sorts of Chinese text splitter arithmetic, we discussed how to use max match Chinese text splitter arithmetic in the Chinese text splitter.Chapter 5: This chapter compares all sorts of feature select arithmetic. The advantage and disadvantage of these arithmetic are summarized. We proposed a arithmetic named as advanced Mutual Information to realize the extraction of the character.Chapter 6: This chapter compares all sorts of methods in classification of Email. We proposed to use Naive Bayes machine learning method and discussed the arithmetic of how to category the Email with naive bayes machine learning. At last we present how to reality such a classier.Chapter 7: We summarized the gain and defect of this project and put forward the expectancy of this research.
Keywords/Search Tags:text mining, the classification of Email, Chinese Words splitter, extract the character
PDF Full Text Request
Related items