Font Size: a A A

Study On The Author's Gender Identification For Chinese E-mail Documents Based On SVM

Posted on:2008-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2178360215481773Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of science and technology, people use the network exchangeinformation. E-mail has become an expedient and economical form of communication, atsame time, unfortunately, the phenomenon of e-mail misusage is common on the Internet,such as virus and junk mail,even fraud mail etc. And the damage is increasing. This needsto detect the true author of e-mail and take measure. In these mails, the sender alwaysattempts to hide his true identity in order to avoid detection. The sender's address can beforged and routed through anonymous mail server, or the sender's name may have beenmodified. So it is difficult to find out the real identity of e-mail and undoubtedly to identifythe original author of illegitimate e-mail. It is necessary to study a method to identify theoriginal author's dignity characteristic and provide evidence for computer forensic toinvestigate the criminal responsibility. The true identity of mail author is constituted byauthor's dignity characteristic, such as the author's sex, age…etc, we can grasp the author'sdignity characteristic, find the original author.The research on authorship identification of e-mail has begun for several years andsome achievements on this research are reported overseas. But the research on the trueauthor's gender of e-mail mentioned is a new topic. The true author's gender of e-mail hasimportant meaning on recognizing the email author's identification. In this dissertation, theproblems mentioned above are studied on the basis of domestic and overseas researches.At first this paper makes introduce on the present situation in this field, and studies thepresent techniques and methods. After analyzing linguistic which are related on author'sgender, the abstraction method of features, which are used to identify the author's gender ofe-mail, is studied and furthermore a new feature pattern are presented, making use of thesupport vector machine of classification algorithm, distinguishing the email author's genderautomatically. Through the analysis of present techniques and methods, a method toidentify e-mail authorship, the author's gender of e-mail can be identified through e-mailclassification with support vector machine classifier, is introduced. In order to validate the accuracy and feasibility of the algorithm and method proposed in the dissertation anexperimental platform is developed and some experiments are implemented with it bydifferent topics and parameters from different aspects. The experimental results show thatthe suggested methods in the dissertation are correct and feasible, achieved all expectedgoals of this paper. But the classification precision is far from the computer forensicstandards and further researches should be implemented in the future.
Keywords/Search Tags:E-mail, Language and Gender, Authorship Identification, Support Vector Machine, Computer Forensic
PDF Full Text Request
Related items