Font Size: a A A

Study On The Authorship Mining For Chinese E-mail Documents Based On SVM

Posted on:2005-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:J B MaFull Text:PDF
GTID:2168360122495682Subject:Agricultural mechanization project
Abstract/Summary:PDF Full Text Request
With the rapid growth in computer technology and information level, especially the increasing popularization of Internet, e-mail has become an expedient and economical form of communication. But unfortunately, the phenomenon of e-mail misusage is common on the Internet, such as junk mail, cheating mail, threatening mail and antisocial mail etc. In these mails, the sender always attempts to hide his true identity hi order to avoid detection. The sender's address can be forged and routed through anonymous mail server, or the sender's name may have been modified. So it is difficult to find out the real identity of e-mail and undoubtedly to identify the original author of illegitimate e-mail and provide evidence for computer forensic is an effective method to control the illegitimate e-mail phenomenon. In this paper, we propose one method that identify or classify anonymous e-mail authorship automatically on the basis of analyzing various kinds of data mining technology. We adopt the support vector machine algorithm to extract various e-mail document features including linguistic features, header information and structural characteristics and classify or attribute authorship of e-mail messages to predefined author list. Great progress on classification algorithm and feature extraction strategy has been made. Experiments on a limited number of e-mail documents gave satisfying results. This makes it possible to identify authorship of e-mail. But the classification precision is far from the computer forensic standards and further researches should be implemented in the future.
Keywords/Search Tags:Authorship identification, E-mail, Support Vector Machine, Computer forensic, Data Mining
PDF Full Text Request
Related items