Font Size: a A A

Research On Features Extraction Method Of Attack Group Based On Malicious Code Gene

Posted on:2022-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:A LiFull Text:PDF
GTID:2518306332467134Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,cyber confrontation among countries,has become increasingly intense,and targeted cyber attacks represented by advanced persistent threats have seriously jeopardized the cyber security of important institutions and organizations such as governments,militaries and enterprises.Facing the increasingly serious network security risks,security researchers urgently demand the ability to quickly locate the source of cyber attacks and deter the attack from the source.Malicious code is an important tool used by attackers to carry out cyber attacks,but it is also a principal traceability reference for security researchers.At present,most traceability analysis methods based on malicious code are analyzed from the functional level.However,different attack groups can apply different methods to package the malicious code with the same function for cyber attack,and the same attack groups can also develop the malicious code with different functions for different targets,which brings great difficulties to the attack group traceability.In response to the above problems,this paper mainly extracts the printable string and assembly code fragments which can clearly point to an attack group as the malicious code gene of the attack group.Malicious code gene has exclusivity,which means that it is exclusive to malicious code samples of the same attack groups.The attack group feature is a collection of all the malicious code genes of the attack group.On this basis,the paper presents an attack group features extraction method based on the malicious code gene,which can be summarized as follows:(1)This paper proposes a printable strings representation learning model of malicious code based on TF-IDF fused with Word2Vec.When extracting a printable strings gene for malicious code,it is necessary to determine if the printable string exists only in an attack group sample.In order to facilitate the comparison between printable strings,we need to convert printable strings into easily comparable vectors.As the Word2Vec model focuses on the semantic information of words,this paper first applies the TF-IDF algorithm to calculate the weight of malicious code printable strings in the attack organization sample,then weighted superposes the UNICODE encoding and the special patterns of printable strings,and finally splices it with the original Word2Vec vector to obtain a vector representation containing both the semantic information and the significance of malicious code printable strings.(2)This paper proposes an assembly code representation learning model for bidirectional recurrent neural networks based on the self-attention mechanism.When extracting an assembly code gene of malicious code,it is necessary to determine if the assembly code segment exists only in a specified attack group.To facilitate mutual comparison between assembly code segments,we need to convert assembly functions into easily comparable vectors.The proposed model in this paper enables more comprehensive learning of the context semantics of malicious code compiled functions generated at different compilers and optimization levels,which can obtain assembly code vector representations containing semantics,and eliminate the impact of different compilers and optimization levels on the similarity detection of malicious code functions.(3)This paper proposes an attack group features extraction method based on malicious code genes.Malicious code gene is a printable string or an assembly code segment which is unique to malicious code samples of an attack organization.Attack group features is a collection of all the malicious code genes of the attack group.During the extraction process,we need to remove printable strings and assembly code segments,which are highly similar to other attack group samples.After converting the printable string and the assembly function of malicious code into corresponding vectors,we calculate the cosine similarity between the vectors and then sorted it by similarity.If a printable string or assembly code segment highly resembles the counterpart of other attack groups,then remove the string or the segment.Through continuous iterations,all the malicious code genes of the attack group under the current malicious code sample set are extracted as the attack group features.(4)This paper develops an attack group features extraction system based on malicious code genes.Finally,through considerable comparative experiments,we verify the effectiveness of the method proposed in this paper.Compared with other methods,the recall rate and precision rate are significantly improved.
Keywords/Search Tags:Malicious code gene, Attack groups, Word vector, Feature extraction
PDF Full Text Request
Related items