Font Size: a A A

Research On File Type Identification Technology

Posted on:2012-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:D CaoFull Text:PDF
GTID:2218330371462521Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
File type identification is a task for identifying the real type of a file by the characters of the file entity itself. It is theoretically and practically valuable in computer forensics, firewalls, anti-virus, IDS, email filter and steganalysis. This paper mainly studies the file type identification technology based on the whole structure, signatures and binary content of files respectively. The major contribution of this paper is as follows:1. Current file type identification algorithms lack synthesis of diverse characters of file entity. After the analysis of internal characters of file entity, a File Entity Characteristic Model is constructed, which lays the foundation of works followed.2. To solve the problems in current structure based file type identification algorithm, a file type identification algorithm based on the whole file structure is proposed. Rules are established depending on deep analysis on structure of a certain file type. The rules are then used to be criterions to judge file types. The experiment provides high efficiency and good identification performance results for intact files.3. To solve the problems in current signature based file type identification algorithms, an algorithm based on signature matching in limited region is proposed. The signatures are composed of metadata extracted from all function parts of a file. A region partition algorithm based on variable length sliding window measurement is proposed to partition a tested file into sevral regions, in which the corresponding signatures are matched. Experiments prove effectivity of this algorithm even when tampered or shattered files are tested.4. To solve the problems in current byte frequency distribution based file type identification algorithms, a gram frequency distribution based file type identification algorithm is prposed. Gram is used to describe file statistic and a feature evaluation function is designed to select signatures that bare high discrimination ability to represent a file type. Cosine similarity is used to assign tested files to corresponding types. This algorithm is suitable for genaral file types regardless of whether specification is published. Experiment show better identification result compared to traditional algorithm.Finally, the research work of this paper is summarized, and the future developing direction of file type identification technologies is indicated.
Keywords/Search Tags:File Type Identification, File Entity Characteristic Model, File Structure, File Signature, Gram Frequecy Distribution
PDF Full Text Request
Related items