Font Size: a A A

Research On File Fragment Type Identification Technology Based On Deep Learning

Posted on:2024-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2558307061968649Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the acceleration of social digitized process,more and more terminal devices and network devices can be found in our daily life.These devices record a large amount of user information,which can be utilized for electronic data forensics and network security maintenance.However,it is inevitable to encounter some incomplete or maliciously damaged file fragments during the analysis of hard-drive or memory in the processes of device forensics and network monitoring.If we can efficiently carve these incomplete file fragments into partial or complete files which can be used for content analysis,the efficiency of device forensics and network monitoring can be effectively improved.However,an important prerequisite for efficiently carving these file fragments is to correctly identify the file fragment type.Most existing FFTI methods still suffer from the problems including 1)limited types for identification of file fragments and 2)low identification accuracy for high-entropy and compound file fragments.In order to alleviate these problems,this thesis resorts to advanced deep learning technology and proposes the following two deep learning based FFTI methods from different perspectives.(1)File fragment type identification method based on multi-stream CNN(Convolutional Neural Networks,CNN).The designed deep network has three different branches,in which a trainable embedding layer is used to convert sparse binary file fragment into compact real-valued representations,two consecutive 1-D convolutional modules are used to learn higher level representation of file fragments,and a global average pooling layer to convert feature map into vector.Finally,the outputs of all branches are concatenated and fed into successive fully-connected(FC)layer for classification.The parameters of the convolutional layers in all the 1-D convolutional modules are different.Therefore,the diversity of file fragments can be represented better,alleviating the problem of coarse recognition granularity and poor identification ability of complex files.Experiments show that this method achieves 66.5%and 78.6%accuracy on 512bytes and 4096 bytes,respectively(taking FFT-75 dataset scenario 1 as an example)(2)File fragment type recognition method based on CNN-LSTM.This method first uses a trainable embedding layer to convert sparse binary file fragment into compact real-valued representations.Then,successive 1-D convolutional modules are utilized to learn higher level representation of file fragments.Next,the obtained features are fed into LSTM(Long Short-Term Memory)to learn the correlations between bytes.Finally,we give the last output of each example generated by LSTM to the subsequent successive FC layers for classification.Experimental results show that by taking the correlation between bytes in file fragments into consideration,similar types of file fragments can be distinguished better.This method achieves 67.9%and 79.6%accuracy on 512 bytes and 4096 bytes,respectively.Experimental results on FFT-75 dataset show that the two file fragment type identification algorithms proposed in this paper achieve higher average recognition accuracy than the existing methods(Fi FTy and so on)at 512 bytes and 4096 bytes,which verifies the superiority of the proposed algorithm.As a result,the research results of this thesi~2s can not only provide theoretical basis and technical support for the further research of file fragment type identification technology but also promote the popularization and application of this technology in many fields such as digital forensics and network security.
Keywords/Search Tags:file fragment type identification, deep learning, embedding layer, CNN, LSTM
PDF Full Text Request
Related items