With the rapid growth in computer technology and information level, especially the increasing popularization of Internet, e-mail has become an expedient and economical form of communication. But unfortunately, the phenomenon of e-mail misusage is common on the Internet, such as junk mail, cheating mail, threatening mail and antisocial mail etc. In these mails, the sender always attempts to hide his true identity hi order to avoid detection. The sender's address can be forged and routed through anonymous mail server, or the sender's name may have been modified. So it is difficult to find out the real identity of e-mail and undoubtedly to identify the original author of illegitimate e-mail and provide evidence for computer forensic is an effective method to control the illegitimate e-mail phenomenon. In this paper, we propose one method that identify or classify anonymous e-mail authorship automatically on the basis of analyzing various kinds of data mining technology. We adopt the support vector machine algorithm to extract various e-mail document features including linguistic features, header information and structural characteristics and classify or attribute authorship of e-mail messages to predefined author list. Great progress on classification algorithm and feature extraction strategy has been made. Experiments on a limited number of e-mail documents gave satisfying results. This makes it possible to identify authorship of e-mail. But the classification precision is far from the computer forensic standards and further researches should be implemented in the future. |