Font Size: a A A

Classification, Based On E-mail To Improve The Vector Space Model

Posted on:2008-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:L LiaoFull Text:PDF
GTID:2208360215485054Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, email becomes a communication tool which is essential to the people. But it takes consumers lots of time to arrange emails. Therefore, it is important to study automatic email classification. At present, automatic email classification contains two methods: one is based on regulation, the other is based on statistic. Here, it mainly studies the statistic method.This paper has studied two stages that are necessary in email classification, including training stage and classification stage. It has also studied the technologies which are often adopted in these two stages. These technologies contain email expression, feature selection and extraction, classification and so on. Among these technologies this paper mainly discusses the email expression and attaches importance to the vector space model(VSM).Email classification often uses VSM as a tool to represent email. This model expresses the email in a vector form, which can only calculate the vector instead of the email's content. As a result, methods in pattern recognition and other fields can be used into the natural language processing, also the email can be operated and calculated. But the VSM ignores email's stucture, which affects the precision of classification.In allusion to shortcomings of the VSM, a new method that calculates word's weight is proposed, which adopts the idea that uses glue measure to extract n-grams. The method takes paragraphs as units, takes email's content as a n-gram, takes paragraphs as the words in a n-gram, combines logical relations between paragraphs to calculate word's weight. This method not only does not upset email content's order but also embodies the feature of email's structure. It can bring the vector space model advantage into play, also it can improve the precision of classification.The experimental results in this paper prove that the modified vector space model algorithm not only improves the accuracy of classification but also improves the performance of classification, compared with the traditional vector space model.
Keywords/Search Tags:natural language processing, email classification, vector space model, glue measure
PDF Full Text Request
Related items