Font Size: a A A

Research On Malicious Code Detection Method Based On Dynamic API Initial Sequence

Posted on:2023-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:H PengFull Text:PDF
GTID:2558306902479974Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Malicious code,as one of the key concerns in the field of information security,has become more and more intelligent and complex with the development of the Internet.Static analysis allows for quick analysis of malicious code files without execution,but is susceptible to code obfuscation techniques that degrade the performance of the detection model.Dynamic analysis,as a necessary supplement to static analysis technology,usually takes a long time to capture behavioral information by collecting behavioral data generated during file execution,which means that malicious attacks may have occurred before being detected.Therefore,how to reduce the high time loss of dynamic analysis while ensuring the detection performance is worth studying.This dissertation proposes a malicious code detection method based on dynamic API initial sequences by analyzing the behavioral data at the early stage of malicious code execution.The method is divided into three phases in total: behavioral data feature acquisition,feature preprocessing,and detector model construction.Firstly,in view of the high time consumption problem in the current dynamic analysis method,an early behavior feature extraction method based on sample files is proposed.Different from the conventional full-running sample files,this dissertation only analyzes the early behavior data of the sample.By controlling the running time of the sample files,the time consumption generated by dynamic analysis can be greatly reduced.Secondly,an improved feature vectorization method is proposed,which combines word embedding technology Word2 Vec to process one-hot feature vector and TF-IDF weighted feature vector operation.In order to avoid the dimension disaster caused by traditional word vector model and one-hot text representation,the importance of different functions carried by vector features is strengthened,and the overall detection performance of the model is improved.Finally,in order to avoid system detection,malicious code makers usually choose to override or reorder their malicious code function order,this dissertation introduces the average pooling layer,and then proposes an improved Bi-LSTM network model,which takes the weighted feature vector as the input of the model to train the improved network model.It maintains the correlation,importance and semantic features of the API call sequence,and maintains the position invariance of the feature,thus improving the performance of the malicious code detection model.In order to verify the feasibility of the proposed model for malicious code detection based on early behavioral data,three sets of comparison experiments are conducted to verify the positive impact of the proposed weighted word vector model(one-hot+Word2Vec+TF-IDF)characterizing data features on the overall model,and to verify that the proposed Bi-LSTM model based on average pooling is more suitable for the proposed detection method.At the end of the experiment,it is demonstrated that the proposed model can reduce the time loss of dynamic analysis while ensuring the detection performance by comparing with other detection models.
Keywords/Search Tags:Malicious Code, Dynamic Analysis, API, Bi-LSTM
PDF Full Text Request
Related items