Font Size: a A A

Windows Malicious Code Detection And Analysis Based On Behavior Characteristics

Posted on:2022-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2518306482465744Subject:Cyberspace security law enforcement technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the profit-driven malicious code industry chain is quietly growing.In recent years,the average annual capture volume of malicious code in Windows systems in my country has been above millions,causing huge losses to users.Therefore,it is necessary to strengthen the research of malicious code analysis and detection technology.Traditional feature extraction methods and analysis and detection methods have problems such as incomplete feature extraction,low detection accuracy,and low efficiency in identifying unknown malicious codes outside the data set;at the same time,malicious code technology is constantly updated and iterated,resulting in anti-detection technologies such as obfuscation,flowered instructions,and shelling,which are traditionaldetection methods difficult to deal with.In order to solve the above problems,this thesis uses a dynamic analysis method to propose a feature extraction method of thread fusion,and adds the TF-IDF value of the API sequence fragment to construct a feature matrix.The classification detection algorithm LR is optimized using the gradient descent method and the vectorization solving method,and the detection model is built.The specificworkis as follows:1.In this thesis,a Cuckoo sandbox dynamic analysis environment is built in the Linux system,and malicious code is run in the Windows virtual machine monitored by the sandbox.The behavior ofmalicious code is captured by setting the Host mechanism to form a behavior report in.json format.Filterthe.json files that meet the conditions and complete theparametersto form theoriginaldata set.2.This article analyzes the API calls of two typical viruses WannaCry and GandCrab in detail.During the analysis process,it was found that different thread APIs of the multi-threaded malicious code had related calls,and the same thread of the same type of malicious code had similar API call fragments.Therefore,this thesis proposes a feature extraction method of thread fusion,using thread as a unit,using Python to extract up to 5000 API call information for each thread of the.json file to forman API call parameterdata set.3.In this thesis,we designed statistical features to calculate the frequency of single API calls for the entire sample,which can roughly distinguish malicious code from normal samples in terms of order of magnitude;we designed computing features to compute the API call parameters in each thread.The TF-IDF values of API sequence fragments with lengths of 1-4 were also calculated as part of the calculated characteristics.The above-mentioned features are constructed using One-Hot coding technology to construct a feature vector and synthesize a feature matrix,which improves the detection accuracy ofmalicious code.4.This article chooses the LR algorithm to build the classification model.The gradient descent method is used to optimize the calculation process of feature weight parameters,and the vectorization method is used to transform the iterative operation into matrix operation.A number of parameter selection experiments are designed to improve the efficiency of solving the eigenvalue weight parameters ofthe Vec-LR algorithm and theefficiency of model checking.5.In this thesis,the thread fusion feature extraction method and the effect of the Vec-LR algorithm are compared on the test set,and the detection efficiency of the algorithm in this thesis is compared with similar algorithms;selected the unknown malicious code with low recognition in the detection website(Virus Total)and compared it with the current common anti-virus software platform.Experiments have proved that the feature matrix constructed by the thread fusion feature extraction method proposed in the feature extraction stage can improve the classification and recognition ability,and the optimization method of the LR algorithm improves the detection rate and computing efficiency ofthe algorithm.In summary,the detection model in this thesis has a better classification effect,is better than other algorithms in terms of time efficiency and detection accuracy,and has a better detection effect for unknown malicious codes.
Keywords/Search Tags:Malicious code, Sandbox dynamic analysis, API calls function, Thread fusion feature, Vec-LR algorithm
PDF Full Text Request
Related items