Font Size: a A A

Design And Implementation Of Labeling Library Function Algorithm Based On Convolutional Autoencoder

Posted on:2022-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:X Q LiuFull Text:PDF
GTID:2518306314474294Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays,thanks to the rapid development of computer technology,people are increasingly relying on various computer software to handle affairs conveniently,and various software is constantly being introduced.For developers,a large number of third-party library functions are often reused in the software development process.Accurately identifying library functions reused in software is of great significance,such as detecting known vulnerabilities and performing reverse analysis of malware.If we can use automated methods to effectively identify reused library functions,we can improve the efficiency of reverse analysis,reduce the workload of reverse analysts,and even achieve higher accuracy.An optional method for identifying library functions is to match the functions in the library with the functions in the target software,and mark the matching function in the target software.However,due to the diversity of function library versions,compilers,build options,etc.,there are more or less differences between the corresponding two functions,and it is still a difficult task to accurately identify the library functions used in the target software.In this research,we propose a new method for identifying library functions used in target software.We use Bigram to represent the function,and code it through CAE,and then use the generated code to detect the similarity of the two functions This scheme focuses attention on the key semantic features of the function,while obscuring the detailed features.Therefore,it can not only accurately distinguish different functions,but also tolerate subtle differences between different instances of library functions,which satisfies the requirements of library function identification.To evaluate the effectiveness of our method,we collected 451 software projects,including about 3 million functions.Compared with the classic BINDIFF matching tool,our method has advantages in accuracy and recall.Overall,the recognition effect of this method(based on F1 score)can reach more than 96.6%.Especially when the two library versions are far apart,our method shows better anti-interference ability.At the same time,our method is horizontally compared with more related works,which proves the superiority of our method.
Keywords/Search Tags:binary code analysis, library function, convolutional autoencoder, bi-gram
PDF Full Text Request
Related items