Font Size: a A A

Research On Mining Technology Of Malicious Domain Names

Posted on:2020-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y LuoFull Text:PDF
GTID:2428330575961967Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,threats from cyber security are also overlapping.As a completely open service system,the domain name system's unconditional trust mechanism for domain names has become a key component of malicious behavior on the Internet.Driven by economic interests,attackers use malicious domain names to promote the development of botnets and phishing websites,resulting in serious disclosure of victim information and device data,proliferation of DDos attacks,and rapid proliferation of viruses.In order to avoid domain name detection,the lawless person uses the Domain Generation Algorithm(DGA)to generate a massive domain name to implement domain name conversion.Therefore,the accurate detection of malicious domain names has become one of the hotspots in the field of network security research.This thesis focuses on the research of malicious domain name mining technology.Based on the analysis and comparison of existing research,it proposes the detection methods of AGDs and Typosquatting domain name to detect malicious domain names from these two directions.Firstly,the domain name generated by the domain name generation algorithm has certain mobility and hopping characteristics.This thesis proposes a detection method combining white list and classification algorithm.The method uses a white list to effectively filter benign domain names,reducing the pressure on subsequent classification methods.The features are extracted from lexical characteristics and network attributes.In terms of domain name lexical characteristics,Shannon entropy is used to quantify randomness,second-order Markov and Ngram quantified domain names are audible and different,and based on network attributes TTL,Characteristics of IP and WHOIS characteristics are extracted.After processing the two sets of features,classification algorithm is used for training and classification.By using the public domain name dataset and comparing and analyzing the classification effects of XGBoost,SVM and Naive Bayesian algorithm,it is proved that XGBoost can improve the accuracy of domain name detection.Secondly,in view of the characteristics that Typosquatting domain names and benign domain names are very similar in character,this thesis proposes a detection method combining black and white list and clustering method.Combined with the characteristics of Typosquatting domain names,the method uses Jaccard distance and the weighted harmonic average of the proportion of the ratio of the number of common characters in the domain length is used to quantify the similarity.The DBSCAN algorithm based on the density clustering is used to cluster the benign domain names.The black and white list is used to accurately detect the Typosquatting domain name and the benign domain name in their life cycle.The domain name that has not been successfully filtered is calculated edit distance with the domain name within the group after the cluster is grouped.If the edit distance is less than the set threshold,the domain name is determined to be Typosquatting.By using the public domain name dataset,comparing the linear computing edit distance method and the detection effect under different thresholds,it is proved that the method can improve the speed of Typosquatting domain name detection under the similarity of accuracy.
Keywords/Search Tags:Malicious Domain Names, Domain Flux, Typosquatting, XGBoost, DBSCAN Clustering
PDF Full Text Request
Related items