Font Size: a A A

Research On Chinese Email Authorship Identification System

Posted on:2008-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2178360215981773Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the network information technical development, the email had alreadybecome a necessary communication way in people's everyday work and life. But with theconvenience brought by email, it brought a series of new problem such as garbage mail andvirus mail, and made serious harm. Because the email sender always tried to conceal its trueidentity to elude inspection, the research on email authorship identification was needed.Email authorship identification is the basic of obtaining evidence, and it can providetechnical support for making email to electronic evidence, so the email authorshipidentification research has very important meaning.The machine learning method applied in the research on Chinese email authorshipidentification is newly a foreland in domestic these year, and it has already got some usefultheory and experiment results. However, on one hand, the former research method waslimited on support vector machine, and was used to solve the basic two-class classificationproblem, but other methods haven't been used. On the other hand, there wasn't a certainapplication system especially for the Chinese email authorship identification, and it wasvery low neither in research qualification nor in transformation of research results.Through out the analyze of status in and abroad, and in the basis of email characteristicchoosing and presentation, and the experiment of support vector machine classification,this paper made such research below:Firstly, brought forward the application of KNN and artificial neural network to do theChinese email identification, made effect comparison of these two methods and the supportvector machine, and proved through experiment that the support vector machine is the bestof these three methods.Secondly, aim of multi-identification problems in Chinese email identification, andbased on the further research in support vector machine multi-identification methods,brought forward that the method of applying new binary tree support vector machine inChinese email multi-identification. Made effect and efficiency comparison of this newmethod and traditional multi-identification methods. And it was proved throughexperiment that the new binary tree support vector machine multi-identification methodmade balance of identification effect and efficiency.Finally, based on the former two research results, designed and developed a certain research-type system especially for Chinese email authorship identification. Broughtforward the system architecture, brought the system implement process, and madespecification of technique difficulty and specific in detail. This system implemented aseries of functions from email distilling, email stylebook choosing, characteristic choosingand distilling, email author identification, result and performance output, and flexible toolsadding.The application of this system is a further improve of Chinese email authorshipidentification research work, and a key step of transforming theory research andexperiment result to implementation of Chinese email authorship identification, and it canprovide reference to related field, such as text classification, web classification and so on.
Keywords/Search Tags:Email, System Architecture, SVM, Neural Network, KNN, Multi-Class Classification
PDF Full Text Request
Related items