Font Size: a A A

Research On Staged Malicious Domain Names Detection Algorithm Based On Domain Names Words Formation Features

Posted on:2021-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z B ChangFull Text:PDF
GTID:2428330623483946Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The domain name system(DNS),as a basic service of achieving conversion between domain name and the hosting IP addresses in the internet,has been widely used.At the same time,due to its lack of self-detection ability of malicious behavior,it is often attacked,such as distributed denial of service attack,spam,click fraud,domain name hijacking,etc.Therefore,how to quickly and accurately detect malicious domain names and prevent malicious domain name attacks is of great significance for ensuring the normal operation of the Internet.This study comprehensively considers the issues of detection time overhead,accuracy,and detection range,and uses related theories and technologies of natural language processing and deep learning,combined the differences in lexical composition between normal and malicious domain names,to perform staged detection of the domain names to be tested.Firstly,by using the domain name blacklist blocking technology,a domain name with high similarity to the malicious domain name on the domain name blacklist is quickly filtered in the domain name set to be tested to decrease the amount of the domain names to be tested and form a new set of domain names to be tested.Secondly,by using the domain name whitelist lexical analysis technology,the malicious domain names in the new domain name set to be tested that do not meet the lexical composition features of normal domain names are filtered to decrease the amount of the domain names to be tested again and form the final set of domain names to be tested.Finally,by using the deep auto-encoder network to extract the multi-dimensional character features of various family malicious domain names,and combining with the random forest classification algorithm in machine learning,the final set of domain names to be tested are detected,and then identified and filtered out malicious domain name,implemented the detection of malicious domain name in stages.The main contents of this thesis are as follows:(1)A fast detection algorithm for malicious domain names based on lexical features is used to achieve the first stage of domain name detection.Firstly,according to the lexical composition and structure of domain name,all domain names to be tested were normalized according to their lengths and the weights were given to them in this algorithm.Secondly,a clustering algorithm was used to divide domain names to be tested into several groups,and the priority of each domain group was calculated with the improved heap sorting algorithm according to the sum of weights in group,and the difference degree value between each domain in each domain group and the domain name on blacklist were calculated in turn.Finally,malicious domain name was quickly determined according to the difference degree value and a new set of domain name to be tested was structured according to the detection results.(2)A malicious domain names detection algorithm based on lexical analysis and feature quantification is used to achieve the second stage of domain name detection.Firstly,the frequently visited domain names are selected as domain name whitelist samples,and the N-gram method is used to segment the domain names in the domain name whitelist after excluding the top-level domain to obtain domain name substrings containing language elements and give the domain name substring weight according to the repeated frequency of the domain name substring.Secondly,the observed domain name in the new domain name set to be tested is also segmented by the N-gram method,and its substrings are compared with the domain name substrings in domain name whitelist substring set to calculate reputation value of the observed domain name.Finally,the observed domain name is determined according to the reputation value,and the final domain name set to be tested is constructed according to the detection results.(3)A malicious domain name detection algorithm based on multi-character feature is used to achieve the final stage of domain name detection.Firstly,the distributed representations of domain name are used as input to construct a single-layer auto-encoder network and add noise to the input data of the network.By learning the reconstruction error between the original input data and the output data,the noise is removed to form a single-layer denoising auto-encoder network and enhance the robustness of the network.Secondly,by superimposing multiple single-layer denoising auto-encoder networks,the deep auto-encoder network is constructed to compress the distributed representation of the input domain name layer by layer,so that the network can extract a variety of character features of the malicious domain name.Finally,based on the extracted features of multi-dimensional domain names and combined with the random forest classification algorithm,the final set of domain names to be tested are detected.The proposed malicious domain name detection algorithm was tested on the DGA Domain List,Conficker,Zeus,Phishing and Kraken.The experimental results show that the proposed algorithm can better fulfill the problem of detection of malicious domain names such as new variants and emerging families.
Keywords/Search Tags:Malicious domain name detection, Domain names words formation features, Staged detection, Natural language processing, Deep learning
PDF Full Text Request
Related items