Font Size: a A A

Research On Character Coding Based Text Steganography And Its Attack Methods

Posted on:2010-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhaoFull Text:PDF
GTID:2178360302959850Subject:Information security
Abstract/Summary:PDF Full Text Request
Information hiding is another way of saying steganography. It is an information security technology which hides secret information in public carrier and then sends this secret information through transmission of public carrier. Research on detection and restoration in steganography is of significance. Contrary to steganography, the aspect of detection is to judge whether there is hidden information in carrier files ac-cording to some algorithms, and the aspect of restoration is to obtain the original se-cret information from public carriers. Research on detection and restoration can help to develop safer hiding algorithms. Thus it is useful for the improvement of network security.In this dissertation, the detection and restoration of text hiding have been studied. Especially, it conducted research on detection and restoration of comma substitution in mixed texts. Firstly, the related context of steganography was introduced. Then we explained the hiding algorithm of comma substitution in mixed texts in details. Final-ly the corresponding detection and restoration algorithms were proposed.In the aspect of detection on comma substitution in mixed texts, we exploited statistical methods to detect whether there is hidden information in texts according to the traits that are changed during steganography process. Firstly, we assumed that the secret random information is uniformly distributed, we converted the detection of comma to the combination of comma and its adjacent character detection according to the text context. Secondly, statistical feature vector was designed. Finally, in order to distinguish normal texts and stego texts, support vector machine (SVM) was used to classify the input feature vectors. Experiment shows that the algorithm has a high de-tection rate under a low embedding rate. Its detection rate reaches as high as 96% when the embedding rate is 20%.On the restoration aspect of comma substitution in mixed texts, the variation of comma coding before and after steganography results in the related changes between comma and its adjacent characters, so we still combine the comma and its adjacent characters to realize our restoration work. Firstly, a concept of window function was defined, its value is the number of different coding words before and after commas, the number of different coding words before commas is defined as pre-window length, yet the different coding words after commas was defined as post-window length. Then we chose a suitable suspect function according to the relationship between window length and hiding probability, this function was logistics regression function. Finally, based on empirical value, we set a threshold value. It showed that the comma has been changed if the function value is greater than threshold, namely, the comma had been substituted. On the other hand, the comma had not been changed if the function value was smaller than threshold. In experiment, we analyzed the impact on restora-tion rate along with the changes of window length, threshold and logistics regression function parameters. Experiment showed that under various embedding rate this algo-rithm ha a high restoration rate which can reach 90%.The main contributions and innovations of this dessertation include:Aim at the embedding algorithm of coding substitution in mixed texts, we have designed a new detection algorithm which utilized SVM to classify the input feature vector, so as to distinguish normal texts and stego texts. This algorithm can efficiently detect whether there is hidden information in mixed texts under a low embedding rate. And this algorithm lays founda-tions for succeeding research on blind detection algorithms;Aim at the embedding algorithm of coding substitution in mixed texts, we have designed a restoration algorithm which utilized logistics regression function as the suspect function. At the same time, this algorithm specifically restored every single embedding bit instead of detecting the existence of se-cret information and has a high restoration rate. Meanwhile, under various embedding rate, this algorithm has a high restoration rate which reaches as high as 90%;The detection and restoration algorithms in this dissertation can effectively fight against network crime combining with other detection algorithms, they can help to bring out study on more new robust embedding algorithms.
Keywords/Search Tags:Steganography, Text Hiding, Character Coding, Feature Vector, SVM, Window Function, Logistics Regression Model
PDF Full Text Request
Related items