| With the rapid development of the Internet, Email has become one of an essentialpart of human’s daily life, and it plays more and more role in our life. With the sametime, there are so many junk emails like advertisement, making-friends andproducts-on-sale that they have been overrunning on the internet. These junks are notonly waste the network resource, but also cost users much time to deal with them.Therefore, how to effectively filter the junks is very important.This thesis firstly introduces vector space model. Due to lack of naturalseparators like space between words in Chinese sentence, the Chinese wordsegmentation should be made before junk Email filtering. Therefore, we introducedsome current techniques about Chinese word segmentation.Secondly, this thesis analyzed some email filtering algorithm. Some typicalclassification algorithms including Bayesian and other algorithms are detailed, andthen their advantages and shortages are described.The current classification algorithms based on active learning need trainingsamples to build the model for classifying. If the set of training samples doesn’t coversome category of junks, the emails of this category can’t be filtered effective. Theprocedure of classification is improved and a new novel junk filtering algorithm basedon incremental learning system is proposed to solve these problems. Based on theprinciple of NB, the users’ manual operation is utilized to revise the classificationmodel. The experimental results show that the new algorithm gets betterperformance than the traditional na ve Bayesian.Finally, a simple prototype system has been built based on the new algorithmwith Message Application program Interface (MAPI). The prototype system includesEmail receiving sub-system and user’s operation sub-system. The functions of Emailreceiving sub-system include user’s login, email receiving and email’s automaticclassification. The user’s operation sub-system includes email management, email’soperation such as reading, replying, receiving and the Chinese words management.The prototype system can automatically classify the new coming emails and updatethe classification model based on user’s manual operations. |