Font Size: a A A

Research On Key Techniques Of Image File Carving

Posted on:2018-02-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y WuFull Text:PDF
GTID:1368330566998419Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
To combat the computer-related crimes which have become more sophisticated,our country has introduced various policies in the field of procedural law to ensure that electronic evidence becomes legal.However,in order to escape the digital forensic investigation,suspects will erase electronic evidence by the means of hiding,deleting,format etc..Hence the investigators need the file recovery techniques to recover these erased electronic evidence.The traditional file recovery methods rely on file system structures to recover files,and will not work when the file system structures are not present.To overcome this problem,file carving has born.it is a forensic technique that recovers files based merely on file structure and content without file system meta-data,and hereby has been paid much attention in the digital forensic field.Since image is one of the most common digital file formats,this thesis focuses on the image file carving techniques to serve the digital forensic investigations.The main contributions of this dissertation are as follows.(1)Aiming at the problem that graph based reassembly algorithms for fragmented image files are high computation complexity,we propose a pruning algorithm,which employs the constraint of file format to reduce the size of search space and improve the speed and accuracy of the existing graph based reassembly algorithms.Graph based reassembly algorithms are for the scene that image file data is complete but may be fragmented,which cast the reassembly problem as a K-vertex disjoint path problem in a directed complete graph and is an NP-complete problem.Taking BMP file type as an example,we view the constraint of padding byte as the pruning condition and exclude most impossible paths in the directed complete graph.Taking eight classical graph based reassembly algorithms as examples,we show that the proposed method prunes more than 98% impossible edges,produces an accuracy improvement ranging from 32% to 55%,and reduces the run time to a scope from 1/6 to 1/428.(2)Aiming at the problem of weight calculation of graph based reassembly algorithms for fragmented image files,we optimize existing weight algorithms from three aspects,which solve different problems during weight calculation process and can complement each other to jointly improve the performance of reassembly algorithms.Firstly,we theoretically prove that the classical MED algorithm can not effectively determine the adjacency of two data clusters and present the solution,i.e.replacing the set of predicted pixels.Experimental results show that the accuracy of the optimized MED algorithm is improved by at least 39.10%.Secondly,to solve the problem that the classical So D algorithm and the ED algorithm are sensitive to the mutated pixels,we introduce a data mutation robust operator and improve the accuracy of the two algorithms.Experimental results show that the accuracy of the improved So D algorithm is improved by at least3.98%,and the accuracy of the improved ED algorithm is improved by at least 2.95%.Finally,observing that the garbage data in the image file data clusters has a great effect on the performance of reassembly algorithms,we overcome the negative effect of the garbage data by locating the position of these garbage data and excluding them during the weight calculation.Taking So D algorithm as an example,experimental results show that the improved So D algorithm is improved by at least 1.79%.(3)Aiming at problem of estimating image width when the header cluster of a JPEG file is damaged or lost,we propose an image width estimation algorithm with a higher accuracy.Firstly,we conduct a comprehensive comparison on the performance of existing image width estimation methods on two public datasets.Experimental results show that the best methods based on pixels are always better than the best methods based on quantized DCT coefficients.Secondly,in order to keep the good performance of the pixel-based methods when the correct quantization tables are unavailable,we analyse the impact of replacing the correct quantization tables with the standard ones on the performance of the pixel-based methods.Experimental results certify that such a replacement has only a little effect on the performance of the pixel-based methods,and the best of them still outperform the best methods based on quantized DCT coefficients.The two above conclusions indicate that it may be enough to just focus on the pixel-based methods for future work.Finally,we propose a novel pixel-based method.The basic idea is to find MCU pairs adjacent in the vertical direction,which derive candidate image widths,and to select the candidate width which appears most frequently as the estimated image width.Experimental results show that the proposed method usually has the best performance when most MCUs of an image are recovered.(4)Aiming at the problem that the two-value assessment results of existing quality assessment can not effectively measure the quality of reassembled image files,we present a novel quality assessment algorithm with continuous scores for reassembled image files.The proposed method assigns a score ranging from 0 to 1 to a reassembled image file,and the higher score means the high quality of the reassembled image file.The basic idea of the proposed method is to disintegrate the quality assessment of a reassembled image file into the calculations of the contribution of the data clusters of the reassembled image file.The contribution of each data cluster is computed according to two pre-defined rules.We validate the performance of the proposed method with an extensive subjective study involving 588 reassembled image files.Each reassembled image file is scored by29 subjective observers.The consistency between objective scores and subjective scores is quantitatively measured by the metrics CC,SROCC,OR and RMSE.Meanwhile,subjective experimental results also show that the existing quality assessments for images do not have a good performance at assessing the quality of reassembled image files.(5)Aiming at the problem how to assess the quality of the region-of-interest of reassembled image files,we present a quality assessment algorithm for the region-ofinterest of reassembled image files.Firstly,we obtain the region-of-interest in the original images by the proposed figure-ground segmentation algorithm based on sparse representation.Secondly,we migrate the segmentation results from the original file to the reassembled image,and score its quality by calculating the contribution of connected regions of the reassembled image.Experimental results show that the quality assessment made by the proposed algorithm is in accordance with the subjective judgements of human beings.In summary,aiming at the issues of image file carving,such as reassembling the fragmented image files,estimating the JPEG image widths and assessing the quality of reassembled image files,this thesis proposes the corresponding methods.The experimental results show that the proposed methods achieve good performances.
Keywords/Search Tags:file carving, file reassembly, weight calculation, image width estimation, quality assessment, figure-ground segmentation
PDF Full Text Request
Related items