Font Size: a A A

Research On Similarity Matching Technology Of Binary Code Function

Posted on:2017-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y J XiaoFull Text:PDF
GTID:2308330488953141Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the widespread use of computers, malicious software has become ubiquitous, and caused a serious threat. It is reported that Symantec Anti-Virus corporate can find thousands of malicious samples every day. Astonishingly, most of these malware are new variants produced by malware authors for escaping the anti-virus software detected.The source code of malware is very difficult to obtain, so malware reverse analysis becomes an important means for information security. It is reported that many codes in malware variants are reused. After a long term analysis on malwares, reverse engineers have accumulated a large number of analysis results. Facing the complicated reverse works, analysts are difficult to deal with a mass of software upgrades and malware variants emerging endlessly. If we can identify these key functions, which have been analyzed in its predecessor software, we will be able to improve the efficiency of analysis and reduce the workload of analysts. The key point in this work is to identify the similar functions in different versions of a software.In this paper, we extract function features from many aspects such as external interface of a function, control flow, space allocation size occupied by local variables and the referenced string data, termed ProFeature. Based on the extracted ProFeature feature, we propose a novel function matching method TPM (Two-stage ProFeature Matching) to match similar functions in malwares. In the first stage, we consider the two functions, with same ProFeatures, as a similar function pair. In the second stage, based on matched function pairs in the first stage, call relations and our decision rules, TPM match similar functions between different versions of a software recursively until no more similar function pairs are added. Experimental results demonstrate that the precision and recall rate of TPM can reach over 98% and 97%, respectively. And compared with the comparable methods, that is,3-tuple CFG method, diaphora and PatchDiff, TPM can achieve a higher average accuracy, recall rate and stronger stability.
Keywords/Search Tags:Reverse analysis, Malware, Function feature, Function matching
PDF Full Text Request
Related items