Font Size: a A A

Text Neural Network Based Malicious Code Function Classification

Posted on:2022-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2518306551970989Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,malicious code analysis has always been one of the important research topics in the China's network security field.Among them,APT(Advanced Persistent Threat)is a specific malicious code intrusion method.It scans and detects system vulnerabilities,sends malicious vulnerability exploit scripts to the target machine,and then implants binary malicious programs to reach the target of infecting the host.Studying the functional classification model of malicious code can further analyze the malicious code's functional behavior information,thereby effectively improving APT defense technology and protecting network security.However,in recent years,the feature selection of classified malicious code lacks automated tools,and the extracted data features cannot fully describe the semantic behavior of malicious code,resulting in problems such as low classification accuracy and poor code interpretability.Therefore,this paper uses text neural network to start from both static semantics and dynamic behavior,and proposes efficient and accurate automatic classification schemes with functionlevel granularity for exploiting source code and binary malicious code.This article analyzes two types of malicious code.The main research work and innovations are as follows:(1)In view of the lack of automated analysis tools and the difficulty of reading code in the research about malicious vulnerabilities exploiting source code.Proposing the concept of treating code words or phrases as words and phrases,and spatial vector modeling is carried out to construct a kind of MSC-textCNN based on source code semantics.Realize the word meaning recognition of the source code,and achieve the end-to-end classification of malicious vulnerabilities to exploit the attack function of the source code without the pre-process of manually extracting the features.At the same time,machine learning methods are used to establish a vocabulary sets of malicious code features,which can help source code analysts to quickly explain the behavior of the source code.Compared with several traditional machine learning methods,the based on MSC-textCNN classification scheme has a 3.08% to 6.54%improvement in classification accuracy.(2)Considering that static features are easy to be confused or changed and cannot represent the deep behavioral information of malicious code,it is proposed to use Windows API(Application Programming Interface,)sequence to monitor the behavioral information of binary malicious code,and combine CNN(Convolutional Neural Networks)and BiLSTM(Bi-directional Long Short-Term Memory)to construct the classification model MB-textRCNN.The model not only captures the relationship between N-Gram phrases in the system call sequence,but also increases the pre-and post-dependency of the dynamic behavior in the sequence,and finally achieves a classification accuracy of 98.66%.And the scheme still has a good classification performance on the public data set.Compared with other combination schemes,the overall accuracy of the scheme is improved by 5.49% to 7.03%,and the classification performance is better.Integrating the extraction of the semantic features of the two types of malicious code,a visual prototype system for malicious code analysis based on the B/S architecture is realized.On the basis of providing file management functions,the system uses Echarts to help analysts obtain visual information intuitively and efficiently,and uses various responsive interactive technologies such as icon linkage to assist analysts in mining binary malicious code dynamic behavior information.
Keywords/Search Tags:Malicious Code, Classification, Word Embedding, TextCNN, Visualization
PDF Full Text Request
Related items