Font Size: a A A

Type Recovery On Binary Code And Its Application

Posted on:2019-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:C WenFull Text:PDF
GTID:2428330566461633Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recovering type information in binary code is a great challenging problem due partly to the fact that much type-related information has been lost during the compilation from highlevel source code.However,recovering type information in binary code helps a lot in comprehension and analysis of binary code.And it is required,or significantly benefits,many applications,such as decompilation,reverse engineering,vulnerabilities analysis and malware detection.Therefore,the research of the binary code type recovery has great practical significance.Currently,most of the existing research on binary code type recovery tend to resort to program analysis techniques,which can be too conservative to infer types with high accuracy or too heavyweight to be viable in practice.In this paper,we propose a new approach to recovering type information for recovered variables in binary code,which is more precise and more efficient.First,we present our approach to recovering type information in binary code.The idea is motivated by “duck typing”,where the type of a variable is determined by its features and properties.Our approach uses a combination of machine learning and program analysis.In detail,we first extract critical information form instruction-flow and data-flow,namely behaviors and features of variables in binary code.According to these behaviors and features,we learn a classifier with basic types as levels,using various machine learning methods,and then use this classifier to predict types for new,unseen binaries.For composite types,such as pointer and struct,we perform a point-to analysis to recover the target variables and use the classifier to recover the base type for these target variables,base on which,composite types are recovered.Second,we also apply the type recovery technology of binary code to malware detection.Our malware detecting approach is based on classifier.Different from most existing work,we take into account not only the behavior information but also the data information.As far as we know,our approach is the first one to consider data types as features for malware detection.At last,we have implemented our approach in a tool called BITY and used it to conduct a series of experiments to evaluate our approach.The results show that(1)our approach can precisely recover the type information in binary code;(2)our tool is more precise than the commercial tool Hey-Rays and the open source tool Snowman,both in terms of correct types and compatible types;(3)our prototype BITY is efficient and scalable,which is suitable in practice;(4)the type information we recover is capable of detecting malware.
Keywords/Search Tags:Binary Analysis, Type recovery, Data Type, Machine Learning, Malware Detection
PDF Full Text Request
Related items