Font Size: a A A

The Research And Implementation Of Chinese Mail Classification System

Posted on:2006-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ZhouFull Text:PDF
GTID:2178360155967464Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of office automation system rapidly, more and more people like to use e-mail to communicate with each other. Due to process more and more e-mails, people need an e-mail automatic classification system. In the meantime, the content-based text categorization technology has gotten a rapid development with the research and the development of machine learning and data mining. Researchers have devoted themselves to study mail filtering for many years, and have obtained many useful results in this area. However, as there are dynamic alterable standard in mail categorization and the inefficiency and imprecision of the traditional text classification algorithms, there are a few applied mail categorization systems in the market. So this thesis brings forward to construct a self-adaptive and practical mail classification system based on the integration of normal algorithm with regulations after studying Chinese mail corpus. Firstly, the thesis analyses the characteristics of English mail corpus, puts forward the method of constructing a Chinese mail corpus by practical situation and constructs an practical and normative Chinese mail corpus. Then, facing many text classification algorithms, the thesis analyses and compares several representative text classification algorithms, chooses a less space-time complexity's algorithm named Winnow which fits to on-line study and used to filter mail as the research object. After improving the winnow standard algorithm, the thesis has proved that this algorithm can be used to the classification of the mail and has such characteristics as high efficiency and better accuracy by experiments. Finally, the thesis analyses and discusses the ubiquitous self-adaptive problem in the mail classification system, proposes a kind of incremental learning method based on trigger to solve the self-adaptive problem of the mail classification effectively. After that the thesis gives the testing results for the self-adaptive ability by implementing ZHHZ mail classifier. It is proved that the winnow algorithm can be used to implement on mail classification system which has high efficient, reliable features and can meets the change of user's criteria for classification through setting up the rules and through adding self-adaptive ability to winnow algorithm.
Keywords/Search Tags:Winnow, Chinese Mail Categorization, Chinese Mail Corpus, Regulation, Self-adaptive
PDF Full Text Request
Related items