Research On Features Extraction Method Of Attack Group Based On Malicious Code Gene

Posted on:2022-07-18

Degree:Master

Type:Thesis

Country:China

Candidate:A Li

Full Text:PDF

GTID:2518306332467134

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,cyber confrontation among countries,has become increasingly intense,and targeted cyber attacks represented by advanced persistent threats have seriously jeopardized the cyber security of important institutions and organizations such as governments,militaries and enterprises.Facing the increasingly serious network security risks,security researchers urgently demand the ability to quickly locate the source of cyber attacks and deter the attack from the source.Malicious code is an important tool used by attackers to carry out cyber attacks,but it is also a principal traceability reference for security researchers.At present,most traceability analysis methods based on malicious code are analyzed from the functional level.However,different attack groups can apply different methods to package the malicious code with the same function for cyber attack,and the same attack groups can also develop the malicious code with different functions for different targets,which brings great difficulties to the attack group traceability.In response to the above problems,this paper mainly extracts the printable string and assembly code fragments which can clearly point to an attack group as the malicious code gene of the attack group.Malicious code gene has exclusivity,which means that it is exclusive to malicious code samples of the same attack groups.The attack group feature is a collection of all the malicious code genes of the attack group.On this basis,the paper presents an attack group features extraction method based on the malicious code gene,which can be summarized as follows:(1)This paper proposes a printable strings representation learning model of malicious code based on TF-IDF fused with Word2Vec.When extracting a printable strings gene for malicious code,it is necessary to determine if the printable string exists only in an attack group sample.In order to facilitate the comparison between printable strings,we need to convert printable strings into easily comparable vectors.As the Word2Vec model focuses on the semantic information of words,this paper first applies the TF-IDF algorithm to calculate the weight of malicious code printable strings in the attack organization sample,then weighted superposes the UNICODE encoding and the special patterns of printable strings,and finally splices it with the original Word2Vec vector to obtain a vector representation containing both the semantic information and the significance of malicious code printable strings.(2)This paper proposes an assembly code representation learning model for bidirectional recurrent neural networks based on the self-attention mechanism.When extracting an assembly code gene of malicious code,it is necessary to determine if the assembly code segment exists only in a specified attack group.To facilitate mutual comparison between assembly code segments,we need to convert assembly functions into easily comparable vectors.The proposed model in this paper enables more comprehensive learning of the context semantics of malicious code compiled functions generated at different compilers and optimization levels,which can obtain assembly code vector representations containing semantics,and eliminate the impact of different compilers and optimization levels on the similarity detection of malicious code functions.(3)This paper proposes an attack group features extraction method based on malicious code genes.Malicious code gene is a printable string or an assembly code segment which is unique to malicious code samples of an attack organization.Attack group features is a collection of all the malicious code genes of the attack group.During the extraction process,we need to remove printable strings and assembly code segments,which are highly similar to other attack group samples.After converting the printable string and the assembly function of malicious code into corresponding vectors,we calculate the cosine similarity between the vectors and then sorted it by similarity.If a printable string or assembly code segment highly resembles the counterpart of other attack groups,then remove the string or the segment.Through continuous iterations,all the malicious code genes of the attack group under the current malicious code sample set are extracted as the attack group features.(4)This paper develops an attack group features extraction system based on malicious code genes.Finally,through considerable comparative experiments,we verify the effectiveness of the method proposed in this paper.Compared with other methods,the recall rate and precision rate are significantly improved.

Keywords/Search Tags:

Malicious code gene, Attack groups, Word vector, Feature extraction

PDF Full Text Request

Related items

1	Research And Realization Of The Malicious Code Detection System Based On Behavior Feature Analysis
2	Clustering Analysis Of Malicious Code Based On N-gram Feature Extraction
3	Research On Key Technologies Of Malicious Code And Emergency Response In Communication Networks
4	Research On Hybrid Malicious URL Detection Method Based On "Word-Location" Vector
5	Research On Extraction Of Feature Gene Subset Based On A Hybrid Between Genetic Arithmetic And Support Vector Machines
6	Research And Implenmentation Of Malicious Web Page Code Detection Technology
7	Research On Malicious Code Detection Technology
8	Research On Malicious Code Detection Based On Generative Adversarial Network
9	Research And Implementation Of Android Malicious Code Exploration Based On Runtime Feature
10	Text Neural Network Based Malicious Code Function Classification