Font Size: a A A

Study On The Authorship Identification For Chinese E-mail Documents Based On The Literary Style

Posted on:2006-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:S H ChangFull Text:PDF
GTID:2168360155452287Subject:Agricultural mechanization project
Abstract/Summary:PDF Full Text Request
With the rapid growth of Internet, e-mail has become an expedient and economical means of communication, one of the most important forms of communication. E-mail makes the life and work convenient, at one and the same time, unfortunately, the phenomenon of e-mail misusage is common on the Internet, such as virus and junk maih even fraud mail etc. And the damage is increasing. This needs to detect the true author of e-mail and take measure. However, in these mails, the sender always attempts to hide his true identity in order to avoid detection. The sender's address can be forged and routed through anonymous mail server, or the sender's name may have been modified. So it is difficult to find out the true identity of e-mail. Therefore it is necessary to study a method to identify the original author of illegitimate e-mail and provide evidence for computer forensic to investigate the criminal responsibility. Undoubtedly this is an effective way to control the phenomenon of e-mail misusage. At first this paper makes introduce on the present situation in this field, and studies the present techniques and methods. After analyzing detailedly literary style of the Chinese e-mails, the abstraction method of features, which are used to identify the e-mail authorship, is studied and furthermore a new feature pattern and importance weight fomula are presented.a method to identify e-mail authorship, that authorship of e-mail can be identified through e-mail classification based on the author's literary style with support vector machine classifier, is introduced. And a new algorithm to judge new e-mail authorship is studied avoiding to classifying in mistake. In order to validate the accuracy and feasibility of the algorithm and method proposed in the paper, an experimental platform is developed and some experiments are implemented with it by different topics, feature combination with various number, algorithms, kernel functions and parameters from different aspects. The experimental results show that the suggested methods in the paper are correct and feasible, stride forward greatly to practical application.
Keywords/Search Tags:E-mail, Literay Style Authorship Identification, Support Vector Machine, F-test
PDF Full Text Request
Related items