Font Size: a A A

Preprocessing And Detection Of Suspicious APT Data Based On Behavior Features

Posted on:2021-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhaoFull Text:PDF
GTID:2428330626458920Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Advanced Persistent Threat(APT),a typical complex network attack,has gradually attracted scientists' attention owing to its strong pertinence,excellent concealment and destructive nature.Scientists have put forward some detection methods that include malicious code detection,abnormal traffic detection,retrospective analysis of total flow and so on.Previous researches have shown that the detection of DNS data can effectively perceive anomalies in network traffic and then detect APT attacks.Aiming at the problem of DNS data processing,it's feasible to select reasonable DNS behavior features,detect all legal domain names via normal behavior features,and then exclude most of the normal DNS data and expose suspicious data sets,which can provide data support for further APT attack detection.In this thesis,a data processing method based on the normal behavior features of DNS data is proposed,which can effectively distinguish suspicious data from normal data.This method includes two steps of data preprocessing and data detection.During the data preprocessing,the original DNS data is processed firstly,and the white list,number of accessing hosts and total number of accesses per unit time are employed to reduce the data.During data detection,feature extraction and suspicious data detection are performed on the data set after data preprocessing.Eight normal behavior features,such as the accuracy of domain name resolution and the similarity rate of IP addresses,are utilized to quantify the features of the data and calculate the characteristic values of the data.In addition,the isolation forest algorithm in machine learning is used to detect the data.By comparing the set thresholds,the normal DNS data in the data awaiting to be detected and the data set containing the simulated attack are detected.Experiments are performed on the proposed detection method using a large amount of real DNS data.Firstly,using the data preprocessing method proposed in this thesis,91.8% of the data in the original data set was screened out and large-scale compression of the detected data was realized.The data simulation method was used to verify the detection method,and the simulated attack data was added to the compressed data to form a DNS traffic data set involving the simulated attack data.After using the isolation forestalgorithm to detect the data set,the false positive rate of this method for detecting suspicious data is 0.68%,the precision rate is 99.9%,and the detection rate is 99.85%.Of the eight data features used,4 are new data features proposed in this thesis.A comparative experiment was performed in order to measure the detection effect of these four new data features.The results show that the four new features proposed in this thesis have a certain influence on the overall detection effect,which can allow the algorithm to detect the data more accurately.The false positive rate has increased if deleting these four features.The method proposed in this thesis improves the overall detection performance and effectively solves the problem of huge original data volume by considerably reducing the data scale.The proposed data features applied to machine learning algorithms can improve the accuracy of machine learning algorithms in detecting suspicious data.The experiment results show that this method can effectively distinguish between normal data sets and suspicious data sets.The method proposed in this thesis can not only be used independently to effectively screen suspicious DNS data sets,but also can be used as a supplement to other anomaly detection methods.It can also be used as a front-end data processing part of related research and detection which provides an appropriate data processing platform and improve research and detection efficiency for future APT attack research.
Keywords/Search Tags:Advanced persistent threats, DNS data preprocessing, Machine learning, Data detection
PDF Full Text Request
Related items