Font Size: a A A

Research On Linguistic Steganalysis Based On Word Embedding

Posted on:2019-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:J M YuFull Text:PDF
GTID:2428330572495097Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Information hiding technology is an important research hotspot in the field of information security.Information hiding technology can be widely used for covert communications,secure storage and transmission of confidential information,and digital media copyright protection.However,information hiding technology may also be misused by malicious people,such as secretly stealing confidential documents,planning and executing criminal activities,etc.and these activities will cause incalculable damage to the country,society,and individuals.Therefore,it is necessary to study steganalysis to detect and intercept illegal information that is secretly transmitted using information hiding technology.Therefore,research on steganalysis techniques against information hiding is of great significance for maintaining information security.This article through the in-depth study of linguistic steganalysis technology,aim at the problem of the lack of semantic information and the low generalization ability for the current steganalysis method.Utilizing the richness of deep semantic information in the word vector,the normal text and stego text are analyzed to realize the recognition of the stego text.The main research results are as follows:Aiming at detecting the linguistic steganography based on synonym substitution,a novel linguistic steganalysis method is proposed based on word embedding to improve the secret message detection.With the continuous Skip-gram language model,each synonym and its context words are represented as high-dimensional semantic word embeddings for effective capture of semantic similarities.The context fitness characterizes the suitability of a synonym by its semantic similarities with contextual words,and TF-IDF is used to measure the importance degree of context words.SVM is trained on the two features and then applied for steganalysis task.The experimental results show that the proposed steganalysis improves the average accuracies of two existing methods by at least 3.51%.More importantly,the performance can be further improved by considering particular corpus consistent with stego texts for training word embeddings.A linguistic steganalysis method based on word embedding and the convolutional neural network is proposed.With the continuous Skip-gram language model,all the synonyms in the text and the words in its context window are converted into vector matrices which as an input to the convolutional neural network.Then Automatic learning of linguistic steganalysis features through three kinds of convolutional kernel and apply a max-overtime pooling operation over the feature map which deals with variable sentence lengths.This method effectively improves the generalization ability and detection performance of linguistic steganalysis methods.The average detection accuracy reached 98.18%.
Keywords/Search Tags:Natural Language, Information Hiding, Word Vector, Synonym Substitution, Steganalysis, Convolutional neural network
PDF Full Text Request
Related items