Font Size: a A A

A Study On Generation-based Text Steganography

Posted on:2022-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:R YangFull Text:PDF
GTID:2518306323966899Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Text steganography aims to hide secret information in the text so that third-party monitors cannot discover the existence of the secret information transmission process.At this stage,the mainstream text steganography methods include three categories,the format-based text steganography method,the transformation-based text steganography method,and the generation-based text steganography method.The format-based text steganography method has good imperceptibility and high hiding capacity,but it cannot resist re-editing attacks.The transformation-based text steganography method has good imperceptibility,but the embedding rate is low.The generation-based text steganogra-phy method embeds secret information through the choice of candidate words in the process of text generation.This method can not only obtain grammatically correct and semantically coherent text,with good imperceptibility,but also a higher embedding rate.At this stage,the generation-based text steganography method still has some short-comings.1)The text generation model requires the selection of the candidate word with the highest probability for each word position in the normal generation process,while in the generation process of steganography text,words with low probability may be se-lected to embed secret information.This will lead to differences between steganography text and normal text,reducing the imperceptibility of steganography.2)The existing steganography embedding algorithms do not fully consider the probability distribution of word candidates at each position,which in turn affects the text generation quality and embedding rate.3)The existing generation-based text steganography method only considers the word selection step in the text generation process to realize the embedding of secret information,which restricts the further improvement of the embedding rate.In response to the above issues,this dissertation focuses on the generation-based text steganography method to carry out the following research,including:First,the text steganography method using the sampling-based generation and arithmetic coding(AC)is proposed.First of all,steganography baseline methods based on fixed-length coding(FLC)and variable-length coding(VLC)are designed and im-plemented.A control strategy utilizing Kullback-Leibler divergence(KLD)is adopted to control the embedding rate.Further,a steganography method based on the sampling-based story generation model and arithmetic coding is proposed.Experiments show that in the process of sampling-based steganography text generation,arithmetic coding outperforms fixed-length coding and variable-length coding.When the embedding rate is 1.45 bits/word,the arithmetic coding achieves the ideal steganography imperceptibil-ity,and the subjective experiment also shows that the steganography process does not lose the quality of the generated text.Secondly,the text steganography method based on discrete latent variables is pro-posed.This method realizes hiding secret information by combining the text generation method based on discrete latent variables and embedding algorithms.Among them,the text generation based on discrete latent variables first selects a discrete latent variable according to the input context,and then generates the text according to the discrete latent variable.Compared with the previous text steganography method based on sampling,the text steganography method based on discrete latent variables not only embeds se-cret information in each word position in the text generation process,but also embeds secret information in the selection process of the discrete latent variable.Experiments show that,compared with the text steganography method based on sampling,the text steganography method based on discrete latent variables can increase the embedding rate of text steganography while ensuring its ability to resist steganalysis and the sub-jective quality of steganography text.
Keywords/Search Tags:text steganography, sampling-based generation, embedding algorithm, fixed-length coding, variable-length coding, arithmetic coding, discrete latent variable
PDF Full Text Request
Related items