Font Size: a A A

Research On Malicious Domain Name Detection Based On Domain Name Text Features

Posted on:2022-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:L T LiuFull Text:PDF
GTID:2518306779496434Subject:Internet Technology
Abstract/Summary:PDF Full Text Request
In recent years,the ever-increasing number of botnets has brought serious threats to government,energy,manufacturing and other fields involving privacy-critical information.A stable connection between bots and Command and Control servers is a prerequisite for botnet attacks.To achieve this stable connection,current botnet controllers widely use Domain Generation Algorithm(DGA)to generate malicious domain names.Therefore,improving the detection performance of DGA domain names is the key to blocking botnets and maintaining cyberspace security.The machine learning detection method relying on feature engineering improves the detection effectiveness to a certain extent,but the construction of its feature set requires manual extraction,and its feature range is relatively fixed,which makes it difficult to deal with the dynamically changing DGA domain names.In contrast,detection methods based on deep learning can automatically extract domain name features and further improve the detection effect,but there are still two problems that need to be solved.The ability to utilize and extract text information is not high,and the performance of multi-classification is poor;second,new DGA domain name families emerge in an endless stream,especially DGA domain names based on word dictionaries,which have low character randomness and are similar to benign domain names.The distribution and composition are very similar,and the existing detection methods are not effective for their detection and classification.In view of this,the work of this thesis focuses on improving the performance of DGA domain name classification and improving the performance of DGA domain name detection based on word dictionary.The main research contents are divided into three parts:(1)Malware usually uses different DGAs according to different attack objects.In order to help network managers quickly and accurately block attack behaviors,it is particularly important to improve the classification performance of detection models.This thesis proposes a DGA domain name detection method that fuses attention mechanism and parallel hybrid network.In order to improve the utilization of domain name information and extract the deep features of domain names,the feature extraction module of this method uses DPCNN-SE network and Bi LSTM-SA network to extract the deep spatial semantic features and time-series-dependent features of domain names,respectively.The attention mechanism assigns weights to the extracted features.The experimental results show that the method has an accuracy of 0.9618 in the multi-classification task;compared with the four comparison models,the method has the highest F1 value of 21 families in the specific classification results of the 25 DGA domain name families,which proves that the method has the highest F1 value.Effectiveness in detection performance and multi-classification performance.(2)Aiming at the problem of low classification F1 value of nymaim family by common detection methods,this thesis analyzes the differences in Unigram,Bigram and Trigram distribution characteristics between such word dictionary-based DGA domain names and benign domain names,and designs a Bigram-based DGA domain name.Domain Name Data Embedding Methods for Word Segmentation and Word Vectors.This method also retains the single-character feature of the domain name and the 2-gram character combination rule,so that the input of the model has richer domain name features,which helps to improve the training effect of the feature extraction module.The experimental results show that the method of Bigram word segmentation and word vectorization in the character embedding layer can improve the convergence speed of the model and the detection performance of the model.(3)Roots and affixes are key hierarchical features that distinguish different words.In order to accurately capture the word-level semantic information and word formation rules between word characters,this thesis combines the Bigram word segmentation method and proposes a word DGA domain name detection method based on ON-LSTM and selfattention mechanism.In LSTM,the ON-LSTM-SA domain name feature extraction module is constructed to capture the key level information of words in a targeted manner and assign weights to them.The experimental results show that the multi-class F1 value of the method for four common word DGA domain names such as gozi,matsnu,nymaim,and suppobox reaches 0.95,0.96,0.92,and 0.98,respectively,realizing the effective detection and classification of such domain names.
Keywords/Search Tags:Domain Name Generation Algorithm, Malicious Domain Name Detection, Long Short-Term Memory Neural Network, Convolutional Neural Network
PDF Full Text Request
Related items