Font Size: a A A

Research On Data Fragment Type Identi-fication Based On Content

Posted on:2015-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:T T XuFull Text:PDF
GTID:2268330428465070Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
How to identify a variety of file types quickly is a fundamental problem in com-puter. In digital forensics, data recovery, reverse engineering and other fields of-ten encounter the problem of data fragment type identification. However, traditionalmethods based on extension and magic numbers often fail in type identification forthe sack of meta information being damaged or missing. Therefore, file fragment typeidentification is one of the hot and difficult problems to be solved. This paper ex-plored the problem of content based file fragment type identification, especially itskey technology, data fragment feature extraction technique. The main work of thispaper is as follows.In the first place, a data fragment type identification method based on grayscaleis proposed in this paper. The binary string in file fragment is reshaped into a matrixand viewed as a grayscale image by treating byte values in fragment as pixel values ingrayscale. GIST Descriptor which has performed excellently in computer vision isused to extract the feature of the grayscale images generated by file fragments. Basedon the GIST feature, classical classification algorithms are used for fragment typeidentification. Experiments show that compared with Normalised Compression Dis-tance and NLP, the identification accuracy is improved to a certain degree.In the next place, this paper proposed data fragment type identification methodbased on frequency domain and1-gram. To improve identification accuracy, datafragment will be transformed into frequency domain by using discrete consine trans-form at first. The DC coefficient and a small quantity of AC coefficients are extractedas the feature of corresponding fragment. Then, the1-gram distribution feature is ex-tracted by using byte frequency distribution. Based on these two features, classicalclassification algorithms are used in data fragment type identification. Experimentsshow that the data fragment type identification precision is significantly improved by10%-20%when comparing with the method based on grayscale image, NormalisedCompression Distance and NLP.This paper explored the data fragment type identificating problem based on con-tent, especially its key technology, data fragment feature extraction technique. Twomethods, one based on grayscale image, the other based on frequency domain and 1-gram, are proposed and verified by experiments. This work is helpful for the re-search of file carving and unknown type file fragments automatic reverse analysis etc.
Keywords/Search Tags:Data Fragment, Type Identification, Grayscale Image, Discrete CosineTransform, 1-gram
PDF Full Text Request
Related items