Font Size: a A A

On Non-sequential Context Modeling With Application To Executable Data Compression

Posted on:2009-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:W R DaiFull Text:PDF
GTID:2178360242476854Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In the last two decades, data compression techniques have developed rapidly, with extensive applications to the fields, such as text, speech, image, video and executable compression. The process of data compression can be strictly split into two steps: modeling and encoding. Nowadays, since the code length is rather close to the Shannon bound, modeling is playing a critically significant role in the enhancement of compression efficiency. Up to now, mass literature is concentrated on context modeling, and consequently, several classic context modeling methods have been put forward.With the development of network and portable device, executable compression,as a branch of data compression, turns to be significant in the applications such as online distribution of program binaries and storage of driver file in portable device. However, although classic context model algorithms, which consider sequential context models only, perform efficiently in text compression, their performance in executable compression is not satisfactory enough.In this dissertation, we first review the sequential modeling algorithms and their redundancy estimation. And then we generalize the definition of non-sequential contexts and context models by relaxing the limitations on sequential contexts from suffix of predicted subsequence to any permutation of symbols in the subsequence. On such basis, we put forward a method for non-sequential context modeling. There are three main points in it: 1) we establish a series of context weighting trees for non-sequential contexts by introducing the model tree, and we can make weighted estimation according to this series of context weighting trees. 2) The upper bound of model redundancy in non-sequential context modeling is developed according to the weighted estimation, and it is compared with the one in sequential context modeling for the advantage in estimation of source data with non-sequential correlation. 3) Whether training sequence exists or not, the model selection methods are proposed in non-sequential context modeling: a fast method, where predicted symbols are utilized for selection of models, is proposed when there is no training sequence available and when training sequence does exist, an optimal set of models are found by greedy algorithm under the minimum description length (MDL) evaluation.In this dissertation, both sequential models and non-sequential models are considered in executable compression, and the predictions from such models are mixed for a weighted probabilistic estimation. By concluding the sequential and non-sequential correlation in executable files, especially IA-32 instruction set, we apply our non-sequential context modeling method to the assembly instructions collaborating with the correlation in it and customize an optimal set of context models by the described greedy algorithm under MDL evaluation according to the training sequence. Moreover, the executable compression framework is proposed, which comprises of preprocessing which deals with unique correlation in assembly instructions and the syntax parsing of IA-32 instruction, estimation by the selected models (sequential or non-sequential), weighting by a family of normalized least mean square (LMS) algorithms considering Lp-norm of input probability estimation and asymptotically gaining the optimal weighted probability estimation.
Keywords/Search Tags:non-sequential context modeling, executable data compression, context weighting, model redundancy
PDF Full Text Request
Related items