Design And Implementation Of Software Matching Algorithm Based On Variational Autoencoder

Posted on:2020-11-07

Degree:Master

Type:Thesis

Country:China

Candidate:Q Gao

Full Text:PDF

GTID:2428330572484256

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer networks and technologies,a variety of software is constantly being introduced,including various types of malware.Because it is difficult to get the source code of these malware,reverse analysis has become a powerful tool for analyzing malware.In reverse engineering and binary program analysis,function recognition is a fundamental challenge because many malware usually contain a large number of functions that are reused from open source software.If the function can be effectively recognized,not only can the efficiency of the reverse analysis be improved,the workload of the reverse analyst can be reduced,but also the false correlation between the unrelated code bases can be reduced.In function recognition,the recognition of library functions is crucial.It turns out that for programs written in modern high-level languages,the time it takes to separate the library functions is considerable.Reversers feel that time is wasted because they don't have access to new knowledge,but this step has to be repeated during normal analysis.Sometimes knowing the class of a library function can considerably ease the analysis of the program,which can be very helpful in discarding useless information.For example,a C++ function that processes stream data is usually independent of the main algorithm of the program.And each high-level language program uses a lot of standard library functions,sometimes even 95%of all called functions,which is the importance of library function recognition.In this project,we count the opcode types of two adjacent instructions for each block in the function,that is,the frequency at which they appear,store them in the matrix,extract the data in turn from the block in the function,and finally each function.Abstraction is an opcode frequency matrix.Use this matrix as input to the variational self-encoder and train the VAE encoder.A self-encoder is a neural network that uses a backpropagation algorithm to make the output value equal to the input value.It first compresses the input into a latent spatial representation and then reconstructs the output by this characterization.The variable-divide self-encoder is an upgraded version of the automatic encoder,which has a structure similar to that of an automatic encoder and is also composed of an encoder and a decoder.It is only necessary to add some restrictions to the encoding process,forcing the implicit vector generated by it to roughly follow a standard normal distribution,which is the biggest difference from the general automatic encoder.By training a large number of function matrices,an encoder is obtained,which is judged by calculating the similarity of the codes of the two functions when performing library function recognition.

Keywords/Search Tags:

Reverse analysis, library function identification, AutoEncoder

PDF Full Text Request

Related items

1	Design And Implementation Of Labeling Library Function Algorithm Based On Convolutional Autoencoder
2	Research On Library Function Identification Technology In Assemble Level Program Auxiliary Analysis
3	Research On Technology Of Reverse Analysis On Embedded Linux System
4	Research On Key Technologies Of Software Security Analysis Oriented Binary Code Analysis
5	Improved Based On A Semi-supervised Sparse AutoEncoder IM Traffic Identification Model Of Comparison And Research
6	Research On Data Encrypt And Decrypt Process Reverse Analysis
7	Research On Reverse Locating Of Key Functions In Windows Application
8	Research On Similarity Matching Technology Of Binary Code Function
9	Library Function Disposing And Code Cache Management In Binary Translation
10	Research On Function Identification And Recovery Technology In Static Binary Translation Basing On Software Conventions