Font Size: a A A

Research And Implementation Of Bait Document Generation Based On LeakGAN

Posted on:2020-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:S S PangFull Text:PDF
GTID:2428330575495008Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As a cyber attack carrier commonly used by hackers,the bait document mainly consists of two parts:the content of the document and the embedded virus or Trojan.Whether the content of the bait document can induce the victim actively becomes the key to the bait document attack.Generally speaking,there are two ways to construct the content of the bait document.One is to artificially forge the bait document,which is time-consuming and laborious,but has high deceptibility,attractive,and attack success rate.The other is to collect related network resources to build a bait document,which is convenient and fast,but has poor deceptibility,attractive,and low attack success rate.A highly attractive and confusing bait document not only successfully attacks the victim,but also spreads widely among the victims,expanding the scope of the attack.So far,research on rapidly building such high-quality bait documents has not progressed.This paper proposes an automatic generation method of Chinese bait document based on LeakGAN model to solve the problem of bait document construction.As a new long text generation model,the LeakGAN model combines hierarchical reinforcement learning and generative confrontation networks and has a good effect on text generation in Chinese short text and English long text,but lacks application research work in Chinese long text.At the same time,the existing LeakGAN model has two directions that can be optimized:convergence speed and text generation quality.This paper analyzes and optimizes the network structure of LeakGAN,and verifies the effectiveness of the optimized network structure through relevant experiments.In terms of convergence speed,LeakGAN's generator is based on Long Short-Term Memory networks.There is a problem of internal covariate shift in the training iteration process so that the deeper the network layer,the slower the convergence speed.Therefore,in the training iteration process,the batch normalization operation is performed on the input data of the activation function in order to fix the mean and variance of the input data and maintain the consistency of the input data distribution.The experimental results show that the convergence speed of the optimized LeakGAN model is increased by 11.50%and 6.61%respectively compared with the original model in the EMNLP2017 WMT English dataset and the Sogou sports news Chinese dataset.In terms of text generation quality,LeakGAN's discriminator is based on convolutional neural network,which feeds high-dimensional feature information to the generator to guide the text generation process and obtain high-quality text in the pooling layer stage.However,the feature information is not further filtered.The relevant grammatical structure and semantic information are also insufficient to effectively guide the generator Therefore,in this paper,self-attention mechanism is added before and after the convolution layer of the discriminator to screen key information from semantic and grammatical structures of input data and feature maps extracted after convolution.And the self-attention mechanism can flexibly capture global and local connections and enrich the content of grammatical structure and semantic information in feedback information,thereby obtaining high-quality feature information to guide the generator to generate higher-quality text.The experimental results show that the optimized LeakGAN obtained the best BLEU-4 scores of 0.684 and 0.481 respectively on the BLEU evaluation index used in the EMNLP2017 WMT English dataset and the Sogou Sports News Chinese dataset.In this paper,the LeakGAN model structure is optimized,and the generated instance analysis shows that the quality of the generated version of LeakGAN can meet the quality requirements of the bait document construction,which provides a solution to the problem of decoy document construction.
Keywords/Search Tags:Generative Adversarial Networks, Self-attention Mechanism, Batch Normalization, LeakGAN, Bait Document
PDF Full Text Request
Related items