Research On Linguistic Steganalysis Based On Word Embedding

Posted on:2019-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:J M Yu

Full Text:PDF

GTID:2428330572495097

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Information hiding technology is an important research hotspot in the field of information security.Information hiding technology can be widely used for covert communications,secure storage and transmission of confidential information,and digital media copyright protection.However,information hiding technology may also be misused by malicious people,such as secretly stealing confidential documents,planning and executing criminal activities,etc.and these activities will cause incalculable damage to the country,society,and individuals.Therefore,it is necessary to study steganalysis to detect and intercept illegal information that is secretly transmitted using information hiding technology.Therefore,research on steganalysis techniques against information hiding is of great significance for maintaining information security.This article through the in-depth study of linguistic steganalysis technology,aim at the problem of the lack of semantic information and the low generalization ability for the current steganalysis method.Utilizing the richness of deep semantic information in the word vector,the normal text and stego text are analyzed to realize the recognition of the stego text.The main research results are as follows:Aiming at detecting the linguistic steganography based on synonym substitution,a novel linguistic steganalysis method is proposed based on word embedding to improve the secret message detection.With the continuous Skip-gram language model,each synonym and its context words are represented as high-dimensional semantic word embeddings for effective capture of semantic similarities.The context fitness characterizes the suitability of a synonym by its semantic similarities with contextual words,and TF-IDF is used to measure the importance degree of context words.SVM is trained on the two features and then applied for steganalysis task.The experimental results show that the proposed steganalysis improves the average accuracies of two existing methods by at least 3.51%.More importantly,the performance can be further improved by considering particular corpus consistent with stego texts for training word embeddings.A linguistic steganalysis method based on word embedding and the convolutional neural network is proposed.With the continuous Skip-gram language model,all the synonyms in the text and the words in its context window are converted into vector matrices which as an input to the convolutional neural network.Then Automatic learning of linguistic steganalysis features through three kinds of convolutional kernel and apply a max-overtime pooling operation over the feature map which deals with variable sentence lengths.This method effectively improves the generalization ability and detection performance of linguistic steganalysis methods.The average detection accuracy reached 98.18%.

Keywords/Search Tags:

Natural Language, Information Hiding, Word Vector, Synonym Substitution, Steganalysis, Convolutional neural network

PDF Full Text Request

Related items

1	Research On Text Information Hiding Based On Lossless Compression
2	Research On Reversible Natural Language Watermarking Based On Synonym Substitution
3	Research And Implementation Of Natural Language Information Hiding Algorithm Based On Abstract Embedding Unit
4	The Research On Natural Language Information Hiding Based On Synonymy Substitution
5	The Research On Natural Language Information Hiding Technology Based On Steganography Coding
6	Research And Implementation Of Information Hiding System For Instant Message Based On Synonymy Substitution
7	Research On Text Steganography Based On Word Frequency Distribution
8	A Feature Space Optimized Algorithm Based On Word Embeddings For Synonym Expansion
9	Research On Digital Image Steganalysis In Network Environment
10	Research On Image Steganalysis Method Based On Convolutional Neural Network