Font Size: a A A

Research On Sensitive Email Classification Based On BERT Model

Posted on:2022-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:P Q DuFull Text:PDF
GTID:2518306326483454Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Email has become an essential communication medium in people's life and work,and plays an indispensable role in people's life and work at this stage.While e-mail brings convenience,there is also the phenomenon that the overall security situation of e-mail is not optimistic.and mailboxes are subject to attacks and database leaks,resulting in the leakage of emails containing a large amount of sensitive information,which can have a very serious impact on society,enterprises,and especially on the security of personal sensitive information.Through classifying and studying sensitive emails,this paper aims to identify emails containing sensitive information from many complicated emails,draw users' attention to highly sensitive emails,and make early warning treatment for individuals and enterprises.Currently,relatively little research has been conducted on the identification and detection of sensitive information in emails,and the commonly used methods cannot identify sensitive information very accurately.In this paper,I improve the BERT model and propose the Bi GRU-att sensitive mail classification method.and the specific research work and contributions are as follows.(1)The BERT model is improved to obtain high-quality dense word vectors for the sparse word vectors of email text,and the experiments are compared with other distributed representations of word vectors for text representation,and the experimental results demonstrate that the improvement of the BERT model can enhance the feature representation of email text word vectors and make the word vectors of email text more suitable for the email classification task in this paper.(2)To address the shortcomings of sparse text of sensitive emails,a translation-based data expansion method is used to expand sensitive email text data,which increases the diversity of text data and the size of the training set,making the data set more balanced and helping the BERT model learn more semantic information of sensitive email text language in the fine-tuning phase.(3)In this paper,normal emails are classified into sensitive emails and non-sensitive emails,and the classification model used is Bi GRU-Att,which is to merge two ordinary one-way GRU networks into a bidirectional GRU network architecture,and introduce an attention scheme to assign values to the features withdrawn by Bi GRU.Finally,Softmax is used to normalize the feature results to derive the sensitivity tendency of email text,and the experiments show that the sensitive email classification method proposed in this paper effectively improves the accuracy of sensitive email classification.
Keywords/Search Tags:BERT Model, Word Vector, Sensitive Mail Classification, BiGRU-att
PDF Full Text Request
Related items