Font Size: a A A

Research On Technologies Of Cross-architecture Binary Code Clone Detection And Reuse-based Code Patching

Posted on:2020-12-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y K HuFull Text:PDF
GTID:1368330623463940Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Binary code clone detection and code patching have numerous important applications in software engineering and security.Binary code clone detection aims to locate similar or identical pieces of code within or between software systems.Analyzers then could reuse the information of the analyzed code so as to understand other code efficiently.The methodology has been applied to known bug detection,malware analysis,and plagiarism detection,etc.Binary code patching is the method which repairs the defective code of binaries.It is useful for legacy code maintenance as well as code hardening.However,it still heavily relies on manual work to fulfill above analysis.Thus,it is necessary to propose methods to avoid manual mistakes and improve productivity.In this thesis,considering the scenarios,we propose approaches for binary code clone detection and patching.On one hand,we adopt semantic signatures and leverage technologies of dynamic instrumentation,static emulation,and runtime information reuse to handle binary code understanding,similarity comparison,and searching separately.On the other hand,we propose the method to patch known defective code of binaries.It transfers the corresponding code of correct versions to fulfill the target.This thesis makes the following contributions:(1)We leverage code clone detection to understand binaries of different instruction set architectures(ISAs).With the widespread of smart devices,more and more programs are ported from traditional architectures(e.g.,x86)to other ones,such as ARM or MIPS.We propose MOCKINBIRD,the method of code clone detection,to understand binaries of variant architec-tures when there has been a well-analyzed version.MOCKINGBIRD adopts semantic signatures and the technology of dynamic instrumentation,it is thus able to handle binary code similarity comparison of different ISAs.Besides,to make the method more robust,we propose strategies of normalization for the signatures.The experimental results show that MOCKINGBIRD achieves the average detection accuracy over 75.0%.When performing comparison between OpenSSL binaries of ARM and MIPS,MOCKINGBIRD improves the accuracy by 50.0%comparing to Multi-MH[1].(2)We propose the method of general binary code clone detection which bases on emulation.Since binary code clone detection is a fundamental technology which has a lot of important applications,we propose a general method CACOMPARE to fulfill the target.It adopts the tech-nology of emulation to capture semantic signatures of binaries.Therefore,it is able to not only tackle similarity comparison of different compilation settings,such as variant compilers or optimization options,but also cover all the code under analysis.The experimental results show that CACOMPARE obtains the average detection accuracy over 75.0%.When performing cross-architecture analysis on BusyBox,CACOMPARE improves the accuracy by 37.9%comparing to BinGo[2].(3)We propose the hybrid method to compare the similarity of binaries for code searching,which is a classical application of code clone detection.Given the reference code,code searching aims to decide whether its similar match exists in the target code.The typical scenario is known vulnerability detection.We propose the hybrid approach BINMATCH to detect clone code.It combines the advantages of both dynamic and static analysis.As a result,it could not only detect code clones with signatures of rich semantics,but achieve full code coverage.The experimental results show that BINMACH is capable of comparing binary code with semantics-equivalent code transformation,e.g.,variant compilation settings,code obfuscation.The experimental results show that the accuracy of BINMATCH is 27.7%higher than that of Asm2Vec[3]on average.(4)We propose to patch vulnerabilities of binary programs via code transfer of correct versions.Binary code patching is meaningful for software maintenance.Thus,we propose the method BINPATCH to patch known vulnerabilities of binaries.It adopts the technology of code clone detection to locate the defective code,and reuse the corresponding code of the corrected version as the patch code.BINPATCH does not require a number of test cases to locate buggy code and verify the correctness of the patch code,avoiding the plausible patches.The experiments indicate that BINPATCH is able to locate the defective code effectively and patch it correctly.
Keywords/Search Tags:Reverse Engineering, Program Analysis, Code Clone Detection, Code Patching, Static Analysis, Dynamic Analysis
PDF Full Text Request
Related items