Font Size: a A A

The Research Of E-mail Classification Based On Sense-group

Posted on:2013-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z T LiFull Text:PDF
GTID:2248330374957082Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Spam filtering is an important filed to be researched in the Internet era.Aprerequisite of spam filtering is e-mail classification,which depends deeply onuser’s subjectivity, so classifying e-mails by their contents attracts moreattention of researchers.Text classification technology is the most importantway to realize e-mail classification by the content.At the present time, chinese text classification techniques are mostly inreference to English ones which extracte words as classification features, andcategory accuracy rate of chinese text is relatively poor because theclassification techniques does not show concern of chinese languagecharacters like the syntax and semantical relationship between the words.In this paper, by considering the linguistic characteristics of chinese, atext classification algorithm which is based on sense-group was proposed andapplied to the e-mail classification.The main contributions of the paper wereas follows:(1) A brief introduction of e-mail and text classification was given,suchas text classification process,the key technology and principles ofclassification algorithms.Given a deep research of Chinese language characters,processing difficulties,and current status of e-mail category basedon text classification.(2)Dependency grammar can express the grammatical relations betweenwords,the current Chinese dependency parsing does not deal with the semanticstructure disambiguation,which result in denpendency errors.In order tohandle this problem,denpengdency parsing with semantic information wasbrought out,which enhanced the denpendency analysis accuracy rate.(3) A compound sentences recogniction algorithm which is based ondenpendency parsing and CRF was proposed, to improve the method for bothtag and no-tag sentences, and had better performance in the experiment.(4) A text classification algorithm based on sense-group was put forward,which extract sense-group from the parsing results to present text at first,thenuse proposed compound sentences recogniction algorithm to distinguishcategory which is related with the compound sentence weight and constitutedfinal weight with value of tf.idf.IG,at last,choose SVM to realize. Theexperiment demonstrates the effectiveness of the proposed algorithm.(5)Text classification algorithm based on sense-group was applied toe-mail classification,and add feedback adjustmen to be suitable for e-mail’scharacteristics considering email user’s subjective.The experimentation resultsdisply that the method based on sense-group has good performance that theprecision and recall reach more than96%. In this paper,a large number of experiments were taken to proved theproposed algorithm is verify and has good performance in e-mail classification.At last, major achievable outcomes by researching are summarized, and a fewsolved most issues are advanced in farther research for the future.
Keywords/Search Tags:email classification, sense-group, dependency parsing, compound sentences relationship recognition, Conditional Random Field
PDF Full Text Request
Related items