Font Size: a A A

Irregular Domain Name Detection Based On Text And DNS Query Fearures

Posted on:2020-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:J F LiFull Text:PDF
GTID:2428330614965823Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since the birth of the Internet,the degree of social informatization has deepened.On the one hand,it has made people's lives more convenient.On the other hand,network security incidents have occurred frequently in recent years,and network security has gradually gained people's attention.It is found that a large number of irregular domain names are often used in phishing websites,remote control Trojans and other network attacks.In order to deal with this situation in time,this thesis will study a set of irregular domain name detection methods based on the existing text characteristics and DNS query performance characteristics of irregular domain names,to combat its harm to the network environment and maintain network security.In the face of massive domain names,existing methods for detecting irregular domain names have their own merits and some shortcomings.The main implementation methods of irregular domain names are Fast-Flux and Domain-Flux.With the general improvement of people's network security awareness,artificially designed phishing domain names has begun to emerge in the network environment.Based on the DNS query results of the domain name,the thesis proposes a domain-based DNS query feature for the Fast-Flux domain name,which extracts the number of IPs returned by a domain name single DNS query and the average Jake distance of the IP query returned by the DNS query.For the Domain-Flux domain name,based on the previous research,the thesis mainly considers the text characteristics of domain names generated by the Domain Generation Algorithm(DGA),and proposes a feature of the number of units of domain name label segmented by numbers,improves two pronunciation features: the proportions of vowels and that of consonants in a domain name,and extracts many text features of domain name in terms of entropy,length,pronunciation,etc.For the phishing domain with induced deception,the thesis combines the Levenshtein edit distance algorithm and the high-frequency sensitive vocabulary in the phishing domain to propose the domain name camouflage feature.The irregularl domain name detection scheme in this thesis is roughly divided into two parts.One is the establishment of a domain name classification model,including the collection and acquisition of domain name data sets and the selection of classification algorithm.The former includes the extraction of multiple classification features such as domain name length,entropy value and pronunciation.The second is the verification and analysis of the classification model.The test set is used to verify the established classification model,and the classification effect of the model is analyzed from the precision rate,recall rate and accuracy rate.In the initial model verification and analysis process,it is found that the classification effect of some specific irregular domain names was not good.Then,the hidden Markov feature and the proposed domain name response time stability feature are added to the classification model.With the new model,the test results show that the average accuracy rate of the proposed method is 92.7%,the precision rate is over 93.9%,and the recall rate is over 90.7%,and it can be used to detect malicious network attacks using irregular domain names.
Keywords/Search Tags:Irregular domain name detection, Text features, Malicious network attacks, Machine learning
PDF Full Text Request
Related items