Font Size: a A A

Research On APT Discovery Technology Based On Malicious Domain Name Detection

Posted on:2022-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:X T BiFull Text:PDF
GTID:2518306761459244Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the process of network attack,APT organization will initiate DNS request to a domain name in the stages of malware accusation and stealing data return,and then conduct data communication with the resolved IP address.The domain name used in the operation is called malicious domain name.Malicious domain name itself and its network behavior characteristics are one of the main ways to associate APT activities or organizations.Therefore,fast and accurate detection technology for malicious domain names can greatly improve the probability of APT discovery.In the real network environment,malicious domain name detection mainly faces the following difficulties: First,the data volume is large and there is no label information.The large volume of data leads to a great increase in the time cost of many algorithms,and the lack of label information leads to the failure of supervised algorithms.Second,the distribution of data is extremely unbalanced.This is mainly reflected in the low proportion of malicious domain names in DNS traffic and their random distribution.As a result,the detection effect of many algorithms in such unbalanced datasets is not ideal,and the generalization ability of relevant models is insufficient.Based on these difficulties,the main contributions of this paper are as follows:(1)The real network collects the mirror DNS protocol data of several subordinate units in a certain energy,finance,science and technology,telecommunications and other industries;Extract 56 data features in five categories: domain name text,morphology,who-is information,query and response and DNS resolution,of which 23 were proposed(used)for the first time;A data optimization method based on the degree of characteristic anomaly is proposed to construct a variety of differentiated unlabeled real network DNS traffic datasets.It solves the problem of insufficient generalization ability of models trained with public datasets.(2)An anomaly detection model based on vector angle fluctuation matrix is proposed.The model considers that the distribution characteristics of malicious domain names and their network behaviors in the whole domain name dataset space are consistent with the definition of outliers in anomaly detection technology.Based on this,the model transforms the problem of malicious domain name detection into the problem of anomaly detection.Unsupervised learning is used to avoid the problem of no label information of real DNS traffic;The classification problem under the condition of extreme imbalance of samples is solved;The core algorithm of the model largely adopts matrix to accelerate the operation process,which solves the problem that the model time cost increases due to the large data volume.(3)A convolutional encoding feature extraction model with multiple discriminators is proposed,and a loss function based on maximum mutual information and prior distribution weighting is designed for the model.Convolutional encoding feature extraction avoids subjective bias in feature extraction and high dependence on expert experience in feature engineering;By maximizing the mutual information between local features and encoding features,and the mutual information between global features and encoding features,the model loss function can constrain the model to extract encoding features with higher discrimination;A priori distribution constrains the distribution of encoding features,which is convenient for subsequent detection models to learn the encoding features.Integrate the above models and functional modules to form domain name detection framework D2(domain detection).Through a large number of comparative experiments with a variety of anomaly detection models,the results show that the proposed model achieves the optimal comprehensive evaluation in algorithm stability,model time cost,malicious domain name identification rate and false positive rate,demonstrating the reliability and effectiveness of the proposed model.
Keywords/Search Tags:Maximum mutual information, Convolutional encoder, Feature extraction, Anomaly detection, Malicious domain name detection, APT
PDF Full Text Request
Related items