Font Size: a A A

Research On Email Classification Using Concept Vector Space Model Based On WordNet

Posted on:2009-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:C CengFull Text:PDF
GTID:2178360272491479Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Email has been an efficient and popular communication mechanism as the number of Internet users increase. Moreover, email often reflects the current hot issues of the social and public feelings, the proliferation of Email at the same time also affected people on the collation and acquisition of information. However, the existence and spread of Email have result in greater interference to us while enjoying the convenience of the Email. If Email can be automatically classified, then people can access to the content of their relations accurately and quickly, which will greatly improve the efficiency, thereby reduce the loss in manpower, financial and material resources. Therefore, Email classification has become a new academic subject.The existing email classification technology can be broadly categorized into: statistics based Classifier, connection based Classifier and rule-based Classifier. Naive Bayes, KNN, SVM are Statistics based methods; neural network is a connection based method; rule-based decision tree is a rule based method. However, they presence a common problem: do not consider the semantic relationships between words so that often appear in high-dimensional vector space, which will greatly reduce the property of the classification;To solving the problems above, this paper presents a new approach of feature selection. In our approach, based on WordNet, for describing a text Email by establishing concept vector space model, we can firstly extract the high-level information on categories during training process by replacing terms with synonymy sets in WordNet and considering hypernymy-hyponymy relation between synonymy sets. Secondly, we design a threshold-determine method. We could satisfy different values of recall and precision while changing the value of the threshold. In the end, we could determine the type of text Email using simple vector classification method and improve its effectiveness.Our approach could improve the accuracy and effectiveness of text Email classification. The result will save us the time and loss in inquiring useful information quickly.
Keywords/Search Tags:Email Classification, WordNet, Concept Vector, VSM
PDF Full Text Request
Related items