Font Size: a A A

Design And Implementation Of Software Matching Algorithm Based On Variational Autoencoder

Posted on:2020-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q GaoFull Text:PDF
GTID:2428330572484256Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer networks and technologies,a variety of software is constantly being introduced,including various types of malware.Because it is difficult to get the source code of these malware,reverse analysis has become a powerful tool for analyzing malware.In reverse engineering and binary program analysis,function recognition is a fundamental challenge because many malware usually contain a large number of functions that are reused from open source software.If the function can be effectively recognized,not only can the efficiency of the reverse analysis be improved,the workload of the reverse analyst can be reduced,but also the false correlation between the unrelated code bases can be reduced.In function recognition,the recognition of library functions is crucial.It turns out that for programs written in modern high-level languages,the time it takes to separate the library functions is considerable.Reversers feel that time is wasted because they don't have access to new knowledge,but this step has to be repeated during normal analysis.Sometimes knowing the class of a library function can considerably ease the analysis of the program,which can be very helpful in discarding useless information.For example,a C++ function that processes stream data is usually independent of the main algorithm of the program.And each high-level language program uses a lot of standard library functions,sometimes even 95%of all called functions,which is the importance of library function recognition.In this project,we count the opcode types of two adjacent instructions for each block in the function,that is,the frequency at which they appear,store them in the matrix,extract the data in turn from the block in the function,and finally each function.Abstraction is an opcode frequency matrix.Use this matrix as input to the variational self-encoder and train the VAE encoder.A self-encoder is a neural network that uses a backpropagation algorithm to make the output value equal to the input value.It first compresses the input into a latent spatial representation and then reconstructs the output by this characterization.The variable-divide self-encoder is an upgraded version of the automatic encoder,which has a structure similar to that of an automatic encoder and is also composed of an encoder and a decoder.It is only necessary to add some restrictions to the encoding process,forcing the implicit vector generated by it to roughly follow a standard normal distribution,which is the biggest difference from the general automatic encoder.By training a large number of function matrices,an encoder is obtained,which is judged by calculating the similarity of the codes of the two functions when performing library function recognition.
Keywords/Search Tags:Reverse analysis, library function identification, AutoEncoder
PDF Full Text Request
Related items