Font Size: a A A

Research And Application On File Fragments Identification And Reassembly Technology

Posted on:2017-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:J GuoFull Text:PDF
GTID:2348330503493049Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology in information age, the importance of data is becoming more and more prominent. Moreover, people are severely dependent on computers and intelligent system equipment, which will undoubtedly bring great convenience to people's work and life. Traditional data recovery technique can recover human mistaken delete data. However, when the file system or metadata information is damaged or lost, it can not recover data, resulting in imponderable losses. File carving technique does not rely on the File System of original disk image. It recovers data from superficial unstructured binary data stream(i.e. the original disk image). It overcomes the disadvantages existing in traditional data recovery technique, such as the fragments forming from discontinuous storage beyond recovering. In consequence, it has received great attention. In this paper, we mainly proposed a file carving algorithm applicable to multi-type files based on the information entropy, byte frequency distribution and mean byte value feature extraction algorithm, combing with support vector machine as the classifier to classify file fragments and reassembling the file fragments using the reassembly algorithm based on logical sequence of disk cluster or the property of data files. Then we demonstrate it by recovering Word documents and JPEG images. Our contribution is as follows:First of all, we propose a file fragments classification algorithm based on content features. The algorithm calculates the entropy range of target type files through information entropy principle and extracts its set by using the entropy feature extraction algorithm. Later based on binary classification and the 1-gram method, as well as byte frequency distribution and mean byte value we further classify the extracting fragments by integrating the supervised learning algorithm based on support vector machine.Secondly, we have executed simulation experiments to verify file fragments classification algorithm based on content features. The results show that our proposed method is feasible and effective.Thirdly, we propose a file carving algorithm based on content features. On the basis of file fragments classification algorithm based on content features, it mainly recombines the target file fragments, determines their connection relationship and recovers them by using the reassembly algorithm based on logical sequence of disk cluster or the property of data files.Finally, in order to verify the feasibility of the algorithm, we select released the disk image of DFRW 2006 as the experimental data and try to recovery Word documents and JPEG images. Comparing with the results of Foremost and PhotoRec tools, the experiment shows that the algorithm is able to recover files from unstructured disk image, thus proving the feasibility and effectiveness of the algorithm.
Keywords/Search Tags:data recovery, file carving, file fragments, content features
PDF Full Text Request
Related items